Brenda Booth - SageMaker Ground Truth.

Background

Review UI is a built-in user interface available in Amazon SageMaker Ground Truth Plus, that enables users to inspect label quality and to provide feedback until customers are satisfied that the labels accurately represent the ground truth. It leverages the annotation UI plus additional options specific to the review workflow; like marking errors when annotation mistakes are found and provide feedback to annotators.

The Review UI was launched in SageMaker Ground Truth Plus during re:Invent 2021 and since its release, customers have reported issues. The research was designed to validate the requirements already captured from customers and to add additional requirements to solve the problems through user research.

The UX work presented in this portfolio, was aimed at solving some of the usability issues reported by customers.

AWS Machine Learning Blog: Inspect your data labels with a visual, no code tool to create high-quality training datasets with Amazon SageMaker Ground Truth Plus

My role: In this project I was in charge of the end-to-end UX process. I led all research activities, including conducting all user interviews. I created design concepts and produced high-fidelity mocks and partnered with the product manager (PM) to define requirements and set priority.

UI before the UX optimization.

User journey and users.

User tasks

Research methodology

Phase 1

• Reviewed and compiled customer feedback provided by OPMs* working directly with customers. • Design review of existent interface.
• Interviewed three OPMs working directly with customers, internal labeling teams and external vendors.
• Identified new requirements.
• Created design concepts that addressed some customer pain-points.

Phase 2 - User interviews to validate initial concepts
• Conducted 15 user interviews 
• 8 QC/Meta QC reviewers. 
• 6 Front End/full stack engineers  
• 1 ML scientist, currently leading ML assisted annotation efforts

Phase 3 - Concept iteration and feature prioritization

• Final concepts.  Definition and prioritization of requirements.
• Design iterations based on feedback. 
• Came up with new requirements for review UI and supported prioritization efforts using collected data.

Phase 1-
Research findings. Top pain-points:

1. The right panel has several usability issues; which makes difficult interacting with the labels.

2. Reporting errors and providing feedback is a slow and a  time-consuming process

3. Visualizing and interacting with labels created is confusing; especially in jobs with 20+ labels.

4. Filtering options for attributes are not useful.

Research findings

1. The right panel has several usability issues; which makes difficult interacting with the labels.

The screenshot on the right is an screenshot from a computer used for quality control.

Research findings
2. Reporting errors and providing feedback is a slow and a time-consuming process.

--

Research findings

3. Visualizing and interacting with labels is confusing; especially in jobs with 20+ labels.

Research findings

4. Filtering options for attributes are not useful."

Current experience:
Users can only filter attributes by " All, 1 or more or None". This Doesn't match the user's mental model when performing this task.

For instance; enable users to find all LabelCategories with attribute "occlusion".

Design concepts

• Provide a high-level overview of the LabelCategories required for each labeling job, a total count of labels present as well as the number of labels created for each label category. i.e. Pedestrian (34), vehicles (15), buildings (21), etc.

• Provide expand and collapse options for LabelCategories. i.e. Expand all pedestrian labels but keep the other LabelCategories collapsed.

• All labels should be collapsed by default, to enable users see the labels created per category at first glance.

• Label attributes are hidden by default. Selecting a label will display the attributes assigned to the label on the right column and contextually next to label on the preview area.

• Enable user to provide feedback within the frame itself to avoid context switching. This will reduce the overall review time.

• Mark errors in labels and attributes instead of typing the feedback through the comment box. This will immensely reduce the overall review time from minutes to few seconds.

• Enable users to apply feedback to a group of labels. The feedback could be applied to a label-category, to a label-attribute or to all of the labels within a selected frame and subsequent frames (only applicable to video object tracking, video object detection, 3D Point cloud and object detection.

Include a multi-select drop down menu that lists all the attributes present in the job. The multi-select option will enable users to select multiple attributes at once.
Include search option with auto-complete, for users to type the attribute name and as they type the name, the drop-down will display the attributes that match the letters typed.
Indicate the filters selected, with an option to remove each one.

Activities to define prioritization

For this research, I ran 2 surveys. One for end-users and the second one for engineers involved in this project.

Survey for end-users
The goal of this survey was to assess user preference based on how useful the feature would be to increase efficiency (complete job faster) or to increase label accuracy. Users were asked to rate their answer in order of how useful the feature is; from extremely useful to not useful at all.

Survey for engineers
The goal of this survey was to assess how difficult the technical implementation could be in people weeks. XS being a relatively easy implementation and L a complex implementation.

Recommended prioritization

Final screens