The key challenge in creating efficient embedded implementations of SSD is not in the feature extraction module, but rather is due to the non-linear bottleneck in the detection stage, which does not lend itself to parallelization. This hinders the ability to lower the processing time per frame, even with custom hardware.
In the full lecture Guttmann describes in detail a data-centric optimization approach to SSD. The approach drastically lowers the number of priors (“anchors”) needed for the detection, and thus linearly decreases time spent on this costly part of the computation. Thus, specialized processors and custom hardware may be better utilized, yielding higher performance and lower latency regardless of the specific hardware used.
For the full version of this video, along with hundreds of others on various embedded vision topics, please visit this link.