Hey, I'm Aritra, a sophomore pursuing my Master's degree in Computer Science at Chennai Mathematical Institute. I am contributing to Mathesar as part of GSoC'23, my project is to " Add more summarization functions ". In this blog, I will share my work and GSoC experience with my organization.
Here is my LinkedIn profile, GitHub profile and portfolio. There, you will find numerous ways to get in touch with me. I will be writing more blogs on my future journeys; you can find them here.


Project Synopsis:

The aim of my project was to add more aggregation functions to the Mathesar UI, which will let users easily aggregate data based on a grouping column. Here is a list of different aggregation functions that we have added to the UI. (Click on the images to visit the issues)
meta issue
Apart from them, I have worked on fixing some regression bugs related to the summarization.

meta issue

Here are explanations of some of the summarization functions we added:
  1. Support for sum aggregation function while summarizing columns:
    In this PR, I have added the sum summarization function. Well, what does that mean? Here is a video that explains how it works. In short, If you have a grouping column and a column you want to aggregate (or calculate the sum of), then this summarization function will result in a table containing the groups and their sums. Let's assume, We have a table which has columns: player's name, Year, Goals scored in that year. Now, we want to calculate how many goals each player scored throughout his journey. It's easy now, right? Just add players as grouping columns and goals as summarization columns.
  2. Add median aggregation function:
    In this PR, I have added the median summarization functions. How is median defined? If you give an ordered array, median is the middle element of the array after sorting.
  3. Add Peak Time aggregation function:
    In this PR, I have added the peak time summarization functions. Now, what is peak time? - Modular average of time? How is this different from the average? Well, time is cyclic and if two events happen at 11:59 pm and 12:01 am, you would not expect the average time to be 12:00 pm, right? Rather 12:00 am is a good candidate. Here is a blog that explains the modular average. Here is the result of such a summarization.
I have explained part of the project through my blogs, here are the links: Here are some of the future works, that I will try to follow up:
  • Add support for multiple grouping columns.
  • Add support for preprocessing of summarized columns.
Here are the links to all my merged PRs and reported issues since the community bonding period:

Again a thanksgiving:

While working on this project, I have learnt a lot. With Brent and Sean's guidance, nothing felt too challenging. Thanks to Dom, Mukesh, Pavish, Rajat, Ghislaine, Anish and my co-interns Maria, Varsha and Anshuman for helping me in the project and answering all my doubts. Lastly, a special thanks to Kriti for managing this wonderful organization.
All the meetings with Brent and Sean, they were super helpful to me, I have gained in-depth knowledge of Git, Django and Postgres. Those community events, where we played different multiplayer games, I will miss them for sure.
So, thank you Mathesar and Google. I'm glad to be a part of such a great programme.