This item is available under a Creative Commons License for non-commercial use only
2. ENGINEERING AND TECHNOLOGY
This paper describes a method used to produce a video of a road in which the foreground itemswhich obstruct the view of the road have been removed i.e. other vehicles. Once these regions have been identified they are replaced using suitable images that closely resemble the original background. The work considers an approach that uses multiple video sequences of the same road (C1...Cn). One video is identified as video Cp , that requires the least repair. All instances of vehicles in each frame of video were identified using a Convolutional Neural Network (CNN). The regions associated with each vehicle were then filled using suitable regions from the frames from the remaining streams (assuming at least one of these streams has a background region visible which matches our query region in Cp ). To match frames and locate suitable patches ideally the video sequences need to be aligned both temporally and structurally. To match the frames temporally a bag of visual words approach was taken. To align the frames structurally a template search was performed on regions surrounding the region to be replaced. Given the template matches, the region between these templates in the matching frame were used to fill where the vehicles were previously, leaving behind only the background.
Clifford, W. & Markham, C. (2019). Ghost towns: semantically labelled object removal from video. IMVIP 2019: Irish Machine Vision & Image Processing, Technological University Dublin, Dublin, Ireland, August 28-30. doi:10.21427/1tag-k222