As a follow up post to the Tropo voicemail detection application demo, I wanted to expand on the topic of accuracy for voicemail detection on outbound calls. Since detection accuracy is a rather ambiguous term and one that’s open to a lot of interpretation, some additional insight is probably called for.
In truth, the only way to know if an answered call was detected accurately or not is to have a human score the call. In other words, someone has to actually make a determination if the call was human or machine and compare that against the automated detection. This can be accomplished in a couple different ways.
The best method that would yield the most honest results would be to record all outbound calls, or at least random sampling of calls. From there, log the detection result (HUMAN, MACHINE, etc.) in the application and in parallel, have a human note the actual result. Accuracy is then quantified by the percentage of calls that have the same result logged by the manual review and the automated detection.
The second approach to this is more commonly used to score voicemail detection accuracy. Outbound calls answered by humans often end up being routed to a human agent. So when an agent receives a transferred call that was actually answered by a machine, they score it accordingly. This means they’re only ever scoring calls that are detected as human. As such, when you hear other providers boast 90%+ detection accuracy rates, this is the method they’re most likely using. Using this statistic, out of every ten calls they think are human, nine really are. However, they will have no idea how many calls they classified as machine that were really answered by a human. So all this metric means is “when we think it was a human, we were right 90% of the time.” Using this same logic, one could claim 100% accuracy rate by simply classifying all calls as a machine and never picking human at all.
Additionally, testing voicemail detection strictly in a lab setting can also yield overly optimistic scores due to the limited amount of “real” calls that can actually be run. To get real-world results, score the calls during a live outbound campaign that reaches a variety of voicemail/answering machines and individuals who answer the phone in their own unique way.