LESSON: Refining NLG systems consistently
Refinement does not fix consistency problems.
In one of Arria’s weather-related projects, different domain experts, stakeholders, and users had different opinions about what the system should do. We attempted to incorporate everyone’s views by having multiple refinement stages, each with a different set of stakeholders. However, this meant that some aspects of the narrative were changed repeatedly, without converging on a consensus.
For example, the system initially reported wind speed in four-knot ranges, such as 20-24. In the first refinement phase, we were asked to instead use 10 or less for low wind speeds, rather than a four-knot range such as 6-10. So we made this change. However, in the next refinement phase (with feedback from a different user), we were asked to get rid of 10 or less and use four-knot ranges instead; that is, go back to the original functionality.
There were many such cases, and some changes ended up being done and undone three times.
Notice
Refinement is a great way of improving NLG systems, but it does not resolve differences between stakeholders. If developers see different opinions, these should be explicitly addressed and resolved, perhaps by a product owner.