I recently tried to use gsDesign to calculate sample sizes for a language model question-answering experiment. Unfortunately, it didn't seem to work. I was calculating the size for a fixed-sample design with and without the package. THe package was giving me a sample size more than twice as large as the other methods I tried. The other numbers were consistent with each other, so seemed probably correct. I couldn't figure out what was going wrong.
The punchline: gsDesign reports total sample size, which is twice the number of questions, because I am comparing two models on the same set of N questions. I eventually figured this out by trying the similar rpact library. rpact's report gives a breakdown of the sample size by group (control vs. treatment) which showed each group was roughly the size I would have expected from other calculations. Not exactly the expected size – but pretty close.
In retrospect it makes sense. These packages are explicitly meant for planning clinical trials. There you are dividing a sample of human patients between treatment and control groups. I'm definitely not using this code for its original intended purpose.