GSoC '23: Add More Summarization Functions
Hey everyone, I'm Aritra, a senior pursuing my B.E. degree in
Electrical Engineering at Jadavpur University. This summer, I will
work on adding more summarization functions to Mathesar UI as a part
of GSoC 2023. I have shared my GSoC acceptance experience in my
first blog (In case you have missed that, here is the
link to the same). In this blog, I
will share more about the project and my progress till now.
Here is my
LinkedIn profile,
GitHub profile
and portfolio. There you will find
numerous ways to get in touch with me. I will be writing more blogs
on my GSoC experience, you can find them
here.
What are summarization functions?
In the context of databases, summarization functions, also known as
aggregate functions, are used to calculate and return a summary
(sum, mean, median, array aggregation etc.) of some columns based on
a Grouping column.
Let's say we have this table:
Players | Level | Matches | Goals |
---|---|---|---|
C. Ronaldo | International | 198 | 122 |
L. Messi | International | 174 | 102 |
L. Messi | Club | 853 | 704 |
C. Ronaldo | Club | 968 | 715 |
Now we want to calculate the total number of matches and goals
scored by each player. How do we do that?
If we know DBMS and SQL, it's fairly simple:
SELECT
Players,
SUM(Matches) as Total_matches,
SUM(Goals) as Total_goals,
FROM table
GROUP BY
Players
Where SUM() is an aggregate function. Similarly,
Pivot table in Excel gives us access to similar feature.
Here are different aggregate functions available in T-SQL, PostgreSQL and Excel.
Mathesar - Get more control over database
Mathesar is a project that aims to deliver database assets to
non-technical personnel. A person with no previous knowledge of DBMS
and Excel should be able to summarize data using intuitive and
user-friendly Mathesar UI, which is our goal. Mathesar currently
supports count and distinct_array_to_agg aggregates,
and my mentors
Brent and
Sean (and me :3) will
work on adding more summarization functions to Mathesar UI this
summer. If you are interested in exploring the related feature
requests and pull requests, here is the link to the
meta
issue.
Following is a video describing the newly implemented sum function.
(full-screen mode recommended)
What I have learnt:
While working on this project, I have learnt sooo... many things
(only two weeks have passed). Why should I expect less from such a
collaborative community?
Though the project stack mentions front-end, backend and
database, my project is more related to the backend (Django) and
database (PostgreSQL and SQLAlchemy). During the implementation of
aggregate functions, I had to study those functions as well as the
current workflow of Mathesar APIs. Moreover, I am getting more
familiar with TypeScript and Svelte.
Besides, Git is one of
the most popular VCSs and is super-important when it comes to
contributing to open-source. So, Sean and Brent explained to me some
concepts of Git (like rebasing, stacking of PRs etc.) and showed me
how they manage so many branches and PRs with Git. Those sessions
helped me a lot.