The Google Summer of Code served as my gateway to the vast and exciting world of open source. Well, why is open-source exciting? Because your contribution might change the way the world thinks and benefit millions of people across the globe. What's more exciting? Your PRs are public. People can come and see, "Oh, here's the PR. That's great." As my mentor Brent said, "It's super satisfactory unless we mess things up, which will also be public :3".

A little bit about Myself:

Hey everyone, I'm Aritra, a senior pursuing my B.E. degree in Electrical Engineering at Jadavpur University. One year back, I used to be a competitive programmer with little knowledge of OOPS and DBMS and no knowledge of web development. And here I am today with only a little knowledge of web development gained :3.
Besides, I spend my leisure time playing football, watching soccer matches (Real Madrid matches mostly), anime and reading mangas.
If you want to know me better, you might want to visit my LinkedIn profile, GitHub profile or go through my portfolio. There you will find numerous ways to get in touch with me. I will be writing more blogs on GSoC, you can find them here.
Enough introduction. It's time to share my GSoC acceptance experience.



Google Summer of Code

Pre-GSoC period:

Google Summer of Code programme has thousands of projects under hundreds of organizations from many countries. I went through many mathematics or development-based projects under organizations like Git, Mathesar, Django, Postman, SymPy and SageMath. Finally, I chose around four projects from Mathesar, SymPy and Postman. Generally, it's better to go with projects that align with your tech stacks so that you can focus on contributing to the project. Some organizations (including Mathesar, the organization I applied for) generally ask for a PR (not necessarily merged) to consider you for GSoC. So, along with your proposal, PRs are also important. Remember this!




Mathesar

My Introduction to Mathesar:

Once I selected the projects, it was time to understand the relevant part of the codebase and discuss it with the community (That's why it is called the "Community Bonding Period"). Where to find the discussion forum of the community? Generally, you should get a mailing list or(and) public communication channels of the organizations from the GSoC project page.
Mathesar's discussion forum was (is) super active. I joined Mathesar's public channel two weeks before the deadline to submit the proposals. Thanks to all the mentors and applicants, within a week, I was able to gain a little understanding of the relevant part of the codebase and API flows. One week later, my first PR got merged :-) After that, with a week in hand, I started writing proposals and finally submitted two proposals for two projects under Mathesar.
On the project announcement day, I got two rejection emails from Google (Due to some bugs, applicants were getting multiple rejection emails, and I thought both of my proposals had been rejected). I was a little sad until Kriti told me I was indeed selected and should wait a bit. Soon, I received the acceptance mail from Google.

An overview of my project, Add more Summarization Functions:

An overview of my project, Add more Summarization Functions: So, Mathesar is a tool aiming to deliver database assets to non-technical personnel. When it comes to data, analysis and inference are of utmost importance. Summarization functions serve the exact purpose. To get a very brief overview of data, we use mean. To get the most frequent term, we use mode. For measuring precision, we use standard deviation. Similarly, the median gives a better central overview when there are outliers. But, they are super common and mainly used when data are numeric.
Let's take an array of words. What summarizations might be helpful? Word count (counting the most frequent word, ranking them based on frequency) might be a good summarization function. But words like 'is', 'of' and 'the' are very frequent but not informative. Text ranking based on TF-IDF (Term Frequency-Inverse Document Frequency) might be the better choice.
While summarizing JSON columns, the union of keys, keys and values and the intersection of keys might be good summarization functions.
Similarly, while summarizing time, modular arithmetic will be an obvious choice. Merging an array column to get a single array will be a good summarization function as well.
I will be working on adding such summarization functions to UI this summer. If you have some unique ideas for summarization functions, please ping me. If relevant, I will discuss them with my mentors, Brent and Sean and see if we can implement them.

Thanksgiving:

All of these wouldn't be possible without the support of my parents, friends and teachers. I'm thankful to them.
I am immensely grateful to Srirupa for introducing me to GSoC and supporting me throughout the community bonding period. She will be contributing to Krita this summer (Check out her blog here).
And lastly, I would like to thank all the mentors and applicants of the Mathesar organization for their constant support, Dominykas for relentlessly reviewing our draft proposals and Sean for all the appreciation and encouragement I received from him.



Mikasa