2025-03-23 Weekly Notes

23 March 2025

After the issues and errors with generating the tree data for all of England, I realised that one of the best ways to proof-check the data was to visualize it and more importantly, to let others see it so they can check things that I might be missing. Because of this, I spent a good chunck of time trying to create a web map of all the trees in London. I used CARTO to do it but it wasn’t straightforward, mostly because of the size of the data. My files were stored in parquet format which apparently CARTO nor Mapbox are capable of transforming automatically to web Mercator, so I had to change the CRS in Sedona and re-generate the parquet partitions. Unfortunately, knowing where each partition is pointed at is not very easy, so what I did was geo-reference using the UK National Grid and then generate the partitions. The result is the map below where all the trees are shown with radius meaning the crown area and the colour the height. I want to improve it because at the moment the layers represent the partitions of the parquet files, but I know you can store the files in BigQuery or S3 and then use CARTO to visualize them (anyone willing to help, I’ll buy a beer and if the weather is good, an ice cream). Nonetheless, I will try that after the paper is done. Speaking of, I realised that I didn’t want to just present correlations between the tree data, remote sensing metrics and deprivation. That seems only descriptive statistics, but following the advice of Anil from previous meetings, I’ve also worked on a metric that combines all of the variables into one. My original idea was to represent environmental deprivation, as the IMD does, but my variables don’t measure that intrinsecally, they just present where more nature is “available”, so I guess it will be more of a nature availability index. It relies on green and blue infrastructure and it is built on the weights from the first two components of a PCA for the entire country. The cool thing is that it can actually tell which geographic areas are urban or rural and you can see the deprivation pattern in the data. However, the logic behind them is just additive, as in it’s just following the form $y = a_1x_1 + … + a_ix_i$, where each $x$ is the “nature metric” for a given geographical area, and each coefficient $a$ is associated with said metric. This would be a-dimensional as variables were normalised. Then, after this I would build a Gini Coefficient of green disparity per Local Authority (this could be done at greater geographic scales such as region). This is based on this paper, but also comments made by Ronita on how to represent my metrics, as the Gini Coefficient is widely used in economics when talking about wealth inequality.

Finally, on a lighter note, Saturday morning after a good tennis session, I went to the Computer Lab to attend a workshop on Sonic Pi by its creator Sam Aaron, as part of the Cambridge Festival. It turned out, the attendants were just as old as the first movie in the Star Wars Sequel Trilogy (The Force Awakens in 2015). 😅 However, I ended up enjoying other activities during the festival, namely the Insomia AI project and the earphones that uses AI to monitor your heart rate. More importantly, expect my appearance in the CST social media channels as I answer questions on super computers, asked by none other than my dear friend and fellow PhD student Onkar Gulati.