This year's exercise is done. So the question is what to do about next year's. I followed through some links from a blog post Mirjam included in an answer to another question, and reached IMPACT-RISK which lists (and is an acronym for) 10 major downsides of generative AI. This led me to think whether can you achieve not only your course goals, through keeping the exercise, avoid a repeat of this year's refusals, and address some of the other downsides. (Your exercise probably already addresses several of the technical ones.)
I'm going to suggest two potential lines of development; I'll come to the environment one second because it makes more sense after the first.
First: Can it become a group exercise, where students work in a small group, perhaps three, to decide exactly how they will prompt AI to get the answer to the question and also to critique the answer? (While it seems most obvious for students to do this while sitting round a table, I see no reason why this couldn't also all be done by typed chat messages.) This will (or should) reduce the number of prompts (and therefore the environmental costs) from the whole class because fewer will be made. It will also provide an experience that is not 'Knockoff', because it is based around normal human interactions.
Second: Can you integrate a discussion of the environmental impact into the task, so students are encouraged to think carefully before they start interacting with AI? (https://what-uses-more.com/ gives some rough estimates of the demands of technology use; and suggests that practices such as Zoom calls without video (at least for most), and setting streaming quality to the lowest possible (unless it really does make a difference) should be normalised.) Things to discuss might include: do they want to use the question as posed, or do they want to add more instructions or information? (e.g., how long and what sort of answer do they want?) And what about follow ups — how many prompts will they allow themselves to use to get an answer? They could set a 'ration' on their resource use — or number of prompts used/outputs generated; if they have a series of prompts, this could also include the time spent between starting to interact with the genAI and stopping (this includes the time spent reading and deciding what to do in reaction to the output).
I barely use genAI, but one thing I find really annoying is that it's not obvious how to enter: I want this sort of answer/reaction to the following. [Hit return.] [Here is the question/actual instruction]. The 'hit return' stage initiates a[n entirely unnecessary and unwanted] response. It may be worth flagging this and also how to get the desired behaviour, so prompts really can be kept to a minimum.
A quite separate extension that has very little to do with your original question could be for different subgroups to try different models and discuss what that can mean, e.g., for decision making about choosing tools that are fit for what one wants to do, access, 'productivity' in the face of data ownership etc. etc..