GSoC '23: My Introduction to Open-Source Contribution
The Google Summer of Code served as my gateway to the vast and exciting world of open source. Well, why is open-source exciting? Because your contribution might change the way the world thinks and benefit millions of people across the globe. What's more exciting? Your PRs are public. People can come and see, "Oh, here's the PR. That's great." As my mentor Brent said, "It's super satisfactory unless we mess things up, which will also be public :3".
A little bit about Myself:
Hey everyone, I'm Aritra, a senior pursuing my B.E. degree in
Electrical Engineering at Jadavpur University. One year back, I used
to be a competitive programmer with little knowledge of OOPS and
DBMS and no knowledge of web development. And here I am today with
only a little knowledge of web development gained :3.
Besides, I spend my leisure time playing football, watching soccer
matches (Real Madrid matches mostly), anime and reading mangas.
If you want to know me better, you might want to visit my
LinkedIn profile,
GitHub profile
or go through my portfolio. There you
will find numerous ways to get in touch with me. I will be writing
more blogs on GSoC, you can find them
here.
Enough introduction. It's time to share my GSoC acceptance
experience.
Pre-GSoC period:
Google Summer of Code programme has thousands of projects under hundreds of organizations from many countries. I went through many mathematics or development-based projects under organizations like Git, Mathesar, Django, Postman, SymPy and SageMath. Finally, I chose around four projects from Mathesar, SymPy and Postman. Generally, it's better to go with projects that align with your tech stacks so that you can focus on contributing to the project. Some organizations (including Mathesar, the organization I applied for) generally ask for a PR (not necessarily merged) to consider you for GSoC. So, along with your proposal, PRs are also important. Remember this!
My Introduction to Mathesar:
Once I selected the projects, it was time to understand the relevant
part of the codebase and discuss it with the community (That's why
it is called the "Community Bonding Period"). Where to find the
discussion forum of the community? Generally, you should get a
mailing list or(and) public communication channels of the
organizations from the GSoC project page.
Mathesar's discussion forum was (is) super active. I joined
Mathesar's public channel two weeks before the deadline to submit
the proposals. Thanks to all the mentors and applicants, within a
week, I was able to gain a little understanding of the relevant part
of the codebase and API flows. One week later, my first PR got
merged :-) After that, with a week in hand, I started writing
proposals and finally submitted two proposals for two projects under
Mathesar.
On the project announcement day, I got two rejection emails from
Google (Due to some bugs, applicants were getting multiple rejection
emails, and I thought both of my proposals had been rejected). I was
a little sad until Kriti told me I was indeed selected and should
wait a bit. Soon, I received the acceptance mail from Google.
An overview of my project, Add more Summarization Functions:
An overview of my project, Add more Summarization Functions: So,
Mathesar is a tool aiming to deliver database assets to
non-technical personnel. When it comes to data, analysis and
inference are of utmost importance. Summarization functions serve
the exact purpose. To get a very brief overview of data, we use
mean. To get the most frequent term, we use mode. For measuring
precision, we use standard deviation. Similarly, the median gives a
better central overview when there are outliers. But, they are super
common and mainly used when data are numeric.
Let's take an array of words. What summarizations might be helpful?
Word count (counting the most frequent word, ranking them based on
frequency) might be a good summarization function. But words like
'is', 'of' and 'the' are very frequent but not informative. Text
ranking based on TF-IDF (Term Frequency-Inverse Document Frequency)
might be the better choice.
While summarizing JSON columns, the union of keys, keys and values
and the intersection of keys might be good summarization functions.
Similarly, while summarizing time, modular arithmetic will be an
obvious choice. Merging an array column to get a single array will
be a good summarization function as well.
I will be working on adding such summarization functions to UI this
summer. If you have some unique ideas for summarization functions,
please ping me. If relevant, I will discuss them with my mentors,
Brent and Sean and see if we can implement them.
Thanksgiving:
All of these wouldn't be possible without the support of my parents,
friends and teachers. I'm thankful to them.
I am immensely grateful to Srirupa for introducing me to GSoC and
supporting me throughout the community bonding period. She will be
contributing to Krita this summer (Check out her blog
here).
And lastly, I would like to thank all the mentors and applicants of
the Mathesar organization for their constant support, Dominykas for
relentlessly reviewing our draft proposals and Sean for all the
appreciation and encouragement I received from him.