Monday, January 29, 2024

Who Contributed to PostgreSQL Development in 2023?

As in previous years, I've pulled together a few statistics on code contributions to PostgreSQL. See previous posts in this series for methodology and caveats. I calculate that, in 2023, there were 221 people who were the principal author of at least one PostgreSQL commit. 66% of the new lines of code were contributed by one of 18 people, and 90% of the new lines of code were contributed by one of 50 people. Here they are. Asterisks indicate non-committers.

 #  |              author              | lines | pct_lines | commits 

----+----------------------------------+-------+-----------+---------

  1 | Tom Lane                         | 15686 |      9.27 |     225

  2 | Robert Haas                      | 12272 |      7.25 |      42

  3 | Jeff Davis                       |  9035 |      5.34 |      61

  4 | Alvaro Herrera                   |  8750 |      5.17 |      51

  5 | Peter Eisentraut                 |  8301 |      4.91 |     240

  6 | Michael Paquier                  |  7404 |      4.38 |     111

  7 | Nikita Glukhov [*]               |  6880 |      4.07 |       3

  8 | Andres Freund                    |  6510 |      3.85 |     114

  9 | Hou Zhijie [*]                   |  4956 |      2.93 |      24

 10 | Heikki Linnakangas               |  4389 |      2.59 |      48

 11 | Bruce Momjian                    |  4259 |      2.52 |      95

 12 | Melanie Plageman [*]             |  4220 |      2.49 |      44

 13 | Nathan Bossart                   |  3982 |      2.35 |      69

 14 | David Rowley                     |  3923 |      2.32 |      65

 15 | Thomas Munro                     |  3731 |      2.21 |      83

 16 | Bertrand Drouvot [*]             |  3398 |      2.01 |      33

 17 | Joseph Koshakow [*]              |  2893 |      1.71 |       9

 18 | Tomas Vondra                     |  2481 |      1.47 |      29

 19 | Georgios Kokolatos [*]           |  2464 |      1.46 |       7

 20 | Andrey Lepikhov [*]              |  2455 |      1.45 |       2

 21 | Dean Rasheed                     |  2382 |      1.41 |      23

 22 | Amit Langote                     |  2117 |      1.25 |      27

 23 | Pavel Stehule  [*]               |  1879 |      1.11 |       2

 24 | Bharath Rupireddy [*]            |  1825 |      1.08 |      36

 25 | Richard Guo [*]                  |  1710 |      1.01 |      40

 26 | Daniel Gustafsson                |  1652 |      0.98 |      47

 27 | Juan Jose Santamaria Flecha  [*] |  1650 |      0.98 |       1

 28 | Brar Piening [*]                 |  1512 |      0.89 |       3

 29 | Peter Geoghegan                  |  1471 |      0.87 |      39

 30 | Hayato Kuroda [*]                |  1410 |      0.83 |      18

 31 | Dag Lem [*]                      |  1315 |      0.78 |       1

 32 | Jacob Champion [*]               |  1287 |      0.76 |      10

 33 | Jelte Fennema [*]                |  1205 |      0.71 |      11

 34 | Justin Pryzby [*]                |  1018 |      0.60 |      13

 35 | Alexander Korotkov               |   975 |      0.58 |      27

 36 | Jim Jones [*]                    |   941 |      0.56 |       2

 37 | Stephen Frost                    |   875 |      0.52 |       8

 38 | Tommy Pavlicek [*]               |   866 |      0.51 |       1

 39 | Onder Kalaci [*]                 |   852 |      0.50 |       4

 40 | Anastasia Lubennikova [*]        |   830 |      0.49 |       1

 41 | Masahiro Ikeda [*]               |   780 |      0.46 |       9

 42 | Andrei Zubkov [*]                |   749 |      0.44 |       2

 43 | Alexander Pyhalov [*]            |   725 |      0.43 |       2

 44 | Matthias van de Meent [*]        |   716 |      0.42 |       7

 45 | Alexander Lakhin [*]             |   695 |      0.41 |      22

 46 | Andrew Dunstan                   |   686 |      0.41 |      20

 47 | John Naylor                      |   653 |      0.39 |       9

 48 | Konstantin Knizhnik [*]          |   644 |      0.38 |       2

 49 | Maxim Orlov [*]                  |   635 |      0.38 |       5

 50 | Vignesh C [*]                    |   626 |      0.37 |      14

As usual, I'm also interested in which committers did the most work to commit patches for which they themselves were not the principal author. Here's how that looked in 2023.

 #  |     committer      | lines | pct_lines | commits 

----+--------------------+-------+-----------+---------

  1 | Tom Lane           | 13527 |     18.24 |     113

  2 | Michael Paquier    | 10959 |     14.78 |     209

  3 | Amit Kapila        |  9119 |     12.30 |      78

  4 | Alexander Korotkov |  6448 |      8.70 |      26

  5 | Alvaro Herrera     |  5850 |      7.89 |      18

  6 | Tomas Vondra       |  4265 |      5.75 |      18

  7 | Andres Freund      |  4239 |      5.72 |      40

  8 | Daniel Gustafsson  |  4228 |      5.70 |      55

  9 | Dean Rasheed       |  3571 |      4.82 |       9

 10 | Peter Eisentraut   |  2948 |      3.98 |      45

 11 | David Rowley       |  1914 |      2.58 |      41

 12 | Amit Langote       |  1398 |      1.89 |       3

 13 | Andrew Dunstan     |  1021 |      1.38 |      10

 14 | Robert Haas        |  1007 |      1.36 |      15

 15 | Masahiko Sawada    |   904 |      1.22 |       7

 16 | Nathan Bossart     |   600 |      0.81 |      19

 17 | Thomas Munro       |   497 |      0.67 |      10

 18 | Peter Geoghegan    |   455 |      0.61 |       7

 19 | John Naylor        |   234 |      0.32 |       6

 20 | Bruce Momjian      |   221 |      0.30 |      26

 21 | Noah Misch         |   212 |      0.29 |       5

 22 | Heikki Linnakangas |   208 |      0.28 |      13

 23 | Tatsuo Ishii       |   121 |      0.16 |       3

 24 | Jeff Davis         |    98 |      0.13 |       9

 25 | Etsuro Fujita      |    94 |      0.13 |       1

 26 | Stephen Frost      |     7 |      0.01 |       1

 27 | Fujii Masao        |     1 |      0.00 |       1

Finally, here are people who sent at least 100 emails to pgsql-hackers in 2023.

 count |         name          

-------+-----------------------

  1772 | Tom Lane

  1690 | Andres Freund

  1508 | Michael Paquier

  1020 | Nathan Bossart

   988 | Amit Kapila

   793 | Peter Eisentraut

   775 | Robert Haas

   558 | Tomas Vondra

   528 | Thomas Munro

   520 | Daniel Gustafsson

   516 | Alvaro Herrera

   510 | Peter Geoghegan

   500 | Jeff Davis

   463 | Peter Smith

   416 | David Rowley

   402 | Andrew Dunstan

   384 | Bertrand Drouvot

   382 | Hayato Kuroda

   372 | Bruce Momjian

   340 | Justin Pryzby

   337 | Masahiko Sawada

   320 | Vignesh C

   319 | Kyotaro Horiguchi

   316 | Bharath Rupireddy

   294 | Pavel Stehule

   281 | Richard Guo

   263 | Ashutosh Bapat

   259 | Melanie Plageman

   253 | John Naylor

   243 | Aleksander Alekseev

   226 | Matthias Van De Meent

   212 | Heikki Linnakangas

   208 | Zhijie Hou

   206 | Jian He

   203 | Tristan Partin

   197 | Shveta Malik

   184 | Jacob Champion

   178 | Amit Langote

   177 | Laurenz Albe

   163 | Jelte Fennema

   161 | David G. Johnston

   160 | Dean Rasheed

   154 | Dilip Kumar

   148 | Tatsuo Ishii

   144 | Stephen Frost

   144 | Jonathan S. Katz

   142 | Alexander Korotkov

   131 | Karl O. Pinc

   124 | Julien Rouhaud

   124 | Alexander Lakhin

   115 | Noah Misch

   113 | Joe Conway

   101 | Vik Fearing

   100 | Gurjeet Singh

As always, it's important to keep in mind that there are many important contributions to the PostgreSQL project other than development, and that these statistics don't even fully or entirely accurately capture the work that goes into development. I present this just as an aid to understanding some of what goes on in the development community, not in any way the last word.

Tuesday, January 09, 2024

Incremental Backups: Evergreen and Other Use Cases

As of this writing, I know of three ways to make use of the incremental backup feature that I committed near the end of last month. I'll be interested to see how people deploy in practice. The first idea is to replace some of the full backups you're currently doing with incremental backups, saving backup time and network transfer. The second idea is to do just as many full backups as you do now, but add incremental backups between them, so that if you need to do PITR, you can use pg_combinebackup to reach the latest incremental backup before the point to which you want to recover, reducing the amount of WAL that you need to replay, and probably speeding up the process quite a bit. The third idea is to give up on taking full backups altogether and only ever take incremental backups.

Wednesday, January 03, 2024

Incremental Backup: What To Copy?

Five days before Christmas I committed my patch to add incremental backup to PostgreSQL. Actually, I've been committing preparatory patches for some months now, but December 20 saw the two main patches land. Since then, there's been a bunch of bug-fix commits, and there are still a few pending items that need to be addressed, but the core of the feature is now committed. If you want a quick overview of the feature, Lukas Fittl has a great video about that. Here, I'd like to talk about the architecture of the feature itself in a little more detail, and specifically with how we decide which data to copy.

Wednesday, December 20, 2023

Praise, Criticism, and Dialogue

When my children were little and I was trying to figure out how to be a parent, I read someplace that you need to have five positive interactions with your child for each negative one to maintain a good relationship. I don't know whether that is fact or myth; a quick Google search suggests that the origin of the idea was in a study about how married couples argue, the idea being that in a good marriage, positive things continue to happen even amidst disagreement. It's wise to be wary about applying a number discovered in a very specific context more generally, but there's a compelling idea here: positive interactions build us up, and negative ones break us down, regardless of whether we're talking about a spouse, a child, or, say, the PostgreSQL community. Too many negative interactions and we just feel like giving up.

Wednesday, June 14, 2023

The PostgreSQL Documentation and the Limitations of Community

In my opinion, the PostgreSQL documentation is simultaneously excellent and fairly poor, and both its excellence and its shortcomings are direct results of the process by which the documentation is produced. The PostgreSQL documentation is stored in the same git repository as the source code, and anyone who patches the source code so as to change documented behavior must also patch the documentation to match.

Thursday, May 25, 2023

Do I Really Need That backup_label File?

I'm sure you already know what I'm going to tell you: "Of course you need that backup_label file. How could you even think that you don't need that backup_label file?" Well, you're right. That is what I'm going to say. But do you know why you need that backup_label file? If you were to remove that backup_label file (or fail to create in the first place, in cases where that is your responsibility), what exactly is the bad thing that would happen to you?

Friday, April 14, 2023

Who Contributed to PostgreSQL Development in 2022?

As in previous years, I've pulled together a few statistics on code contributions to PostgreSQL. See previous posts in this series for methodology and caveats. I calculate that, in 2022, there were 192 people who were the principal author of at least one PostgreSQL commit. 66% of the new lines of code were contributed by one of 14 people, and 90% of the new lines of code were contributed by one of 40 people. Here they are.  Asterisks indicate non-committers.