Deep Learning classification too slow for real time service

Problem: The classification time of a trained neural network was too slow on average for a real time service. Solution: 1. Build additional, less complex models with faster inference times (eg. SVM, Decision Trees) to handle the most common requests very fast and efficient. 2. Distill the neural network (https://arxiv.org/pdf/1711.09784.pdf, https://arxiv.org/abs/1910.01108) into a smaller faster model. 3. Add simple, known suggestions to the UI (thus if the suggestion is chosen, only a lookup instead of a classification is needed) We reduced the average response time, the computation cost, the process speed of the user (suggestions) as well as the user satisfaction (suggestions work as guidance and reduce stress)
1 answer