The focus of CrowdWorkOut is to collect workout videos from workers by enacting/implementing a script based on equipment present in an image, the detection, and annotation of which is again crowdsourced. This seemingly complex idea is broken down into 3 crowdsourcing tasks - EquiAnnot, VidGen, and VidDesc. The division of tasks is done in a way that all the tasks are comprehensive and do not depend upon each other too much.

Check out the Research Paper here!



  • A set of ideas were presented to our Professor.

  • Out of the 5 ideas presented, CrowdWorkOut was selected by the Professor considering the availability of time, resources, and related research.


  • The crowdsourcing approach we adopted is informed by various techniques used popular datasets like ILSVRC, Hollywood in Homes (Charades), PASCAL VOC, Flickr, MSCOCO etc.

  • We looked up and studied various research papers for all these datasets and more.


  • We surveyed to gather demographic information & suggestions for workout worthy objects around them.

  • We divide the entire crowdsourcing tasks into 3 microtasks - EquiAnnot, VidGen, and VidDesc.

  • EquiAnnot: focuses on obtaining images annotations and labels.

  • VidGen: focuses on obtaining scripts and video links for the workout equipment image annotated in the EquiAnnot task.

  • VidDesc: focuses on obtaining relevant descriptions and annotations for videos collected in the VidGen task.


  • We chose a mix of different platforms and techniques, and compare the accuracy of results obtained for each task on each platform.

  • First, authors performed each of the three tasks independently. Second, we had a party of friends and family who performed the tasks with access to the authors' in-person help and guidance. Third, we posted the tasks on Amazon Mechanical Turk with no restrictions on worker qualifications.


  • We conducted 4 experiments in total given the limited time and resources.

  • The first experiment focuses on crowdsourcing task design and how the task division affects the performance of each task. We calculated the average accuracy rate for each task, distribution of the number of good, average and bad HITs for each task and the average time registered per task

  • The second experiment focuses on how each task design influences the preferences of  AMT workers, authors, and family & friends.

  • The focus of the last 2 experiments is on the annotations and labels obtained for images and videos in the EquiAnnot and VidDesc tasks respectively. The metric primarily used is IoU (Intersection over Union)to judge the accuracy of annotations obtained.