Hey everyone,
I'm currently using OpenAI's GPT-3 and while the results have been great, the API costs are starting to add up with the volume we process. We're trying to find ways to optimize these costs without taking a hit on response quality, and I thought I'd reach out to see what strategies you all have tried.
Here's what we've considered or tested so far:
I'd love to hear your thoughts or if you have any other creative solutions!
Thanks!
– Alex
Have you tried setting a maximum token limit for each of your API calls? It helps in keeping usage under control if you have a hard cap on the number of tokens per request. I discovered that 90% of the time, we didn't need responses longer than 200 tokens for our needs.
Hey Alex, I've actually been in the same boat with API costs recently. I found that using GPT-J for non-critical tasks significantly lowered expenses for me. It's a free, open-source model that does a pretty decent job handling simpler tasks! Also, have you considered caching repeated requests, especially for frequently asked questions? That saved us quite a bit.
I'm curious about your experience with Cohere. Did you notice any significant difference in quality or response time compared to OpenAI? Also, with batch processing, has anyone found a sweet spot for batch size to balance cost efficiency with performance latency?