Filling the semantic gap between low level keywords which are retrieved automatically from multimedia data and human interpretations of data becomes critical. The research aims to handle the issue by using semantic search on multimedia data. A model is proposed with this research which detects enter/exit points and performs automatic label generation instead of hand using labelling. The model is implemented on a popular and commonly used benchmarking dataset. After the reliability of the model is proofed, it is implemented on a test dataset. It is indicated that the multiple interpretation of results can improve the retrieved information and make the model usability possible.