While experimenting with evaluator testing in LangSmith, I ran into a subtle but impactful issue that highlights how small implementation details can break otherwise standard behavior.
I created a “Correctness” evaluator for my model. The model requires a specific configuration, incl...