Implementation
The GUI was built over seven iterations across 18 weeks (December 2025 to April 2026). Each iteration had a defined objective, produced working software, and closed with a review before the next iteration began. This section documents the development timeline, the system architecture, and the key engineering decisions that shaped the final system.
V-Model alignment: This page corresponds to the base of the V, where decomposed GUI concepts are integrated into a single, testable subsystem build.
The implementation story is still a product story. Each release turned the GUI from a set of selected concepts into something a boxer could actually use in training, with the architecture serving the user flow rather than the other way around.
Development Iterations
The iteration plan followed the product development sequence from Chapter 3: select the concept, build the minimum viable version, check it against user needs, then refine the training flow before moving to the next layer.
A key product insight emerged from user testing at the Robotics Meets AI Showcase in late January 2026 (mid-Iteration 3). Interviews with boxers of varying skill levels revealed distinct patterns in how beginners, intermediate, and advanced practitioners approached the equipment. These observations directly informed the design of the proficiency assessment—a 6-question checklist administered during signup that classifies users into three tiers based on their responses. This user-centred approach meant the assessment reflected real training behaviour rather than arbitrary difficulty levels.
System Architecture
The application follows a five-layer architecture. Each layer has a single responsibility, so changes in one layer do not cascade into unrelated parts of the codebase.
From the user's perspective, that architecture exists to keep the training experience smooth. A user should see a stable drill flow and fast feedback, even when the underlying data, integration, or session logic changes.
Presentation Layer
PySide6 widgets: all pages, buttons, labels, input fields, event callbacks, visual feedback
Application Layer
Navigation stack and page history, user session state, configuration management
Business Logic Layer
Combo Curriculum Engine (mastery algorithm), Performance Testing Logic, User Management, Proficiency Assessment
Integration Layer
GuiBridge (ROS 2 QThread bridge), CV interface (ROS topics + file-based fallback), Robot arm control (ROS command topics), Phone dashboard (shared SQLite + JSON command file)
Data Layer
SQLite databases (per-user files), configuration file management
The Integration Layer is the critical enabler for parallel development. Rather than calling hardware interfaces directly, all hardware communication is routed through this layer. During development on a Windows laptop with no hardware connected, the layer switches automatically to mock interfaces that simulate hardware responses. When deployed on the Jetson Orin NX, the mock interfaces are replaced by real implementations without any changes to the layers above.
That separation mattered because the product had to keep moving while the rest of the robot was still being developed. The GUI could be tested with realistic training flows before the physical system was fully available.
ROS 2 Integration
The GUI communicates with the broader BoxBunny system through a GuiBridge interface that receives session events and sends user commands. This keeps interface behavior responsive while preserving a clean boundary between user interaction flow and backend processing. From the user's perspective, this enables live updates for punch confirmations, drill progress, session state, and coaching prompts.
In product terms, the bridge is what makes the GUI feel alive during training. The user sees the session react immediately, rather than feeling like they are waiting on a separate machine in the background.
During laptop-only development, the same interface runs in mock mode so the GUI can be validated without connected hardware. Internal node architecture, transport details, and backend implementation are documented in Section 5.3.
IMU Navigation
A key usability constraint identified during needs finding is that boxers cannot reliably operate a touchscreen while actively training. Research on touch input shows that even moderate hand coverage significantly reduces accuracy on targets below 48px (Parhi, Karlson and Myers, 2006). The GUI is therefore designed for normal touchscreen use with bare hands, while the IMU pads provide the training-time control path.
This feature exists because the training product has to work under movement and fast session changes. The navigation model was chosen to keep the boxer focused on the drill instead of on the screen.
The four pads double as navigation controls outside of training sessions. Left pad moves to the previous item, right pad moves to the next, centre pad confirms a selection, and head pad navigates back. This mapping follows the directional convention familiar from physical four-button controllers, which reduces the need for users to learn new interaction patterns (Norman, 2013).
One edge case is the home screen, which has no parent to return to. On the home screen, the head pad opens a quick-access preset overlay instead of attempting a back navigation with nowhere to go.
During active training, all pad navigation is disabled. Without this, punch impacts on the pads would trigger accidental page changes mid-session. The disable state activates during the countdown and active phases of every session and restores automatically during rest and after session completion.
Session Orchestration
The GUI is the session lifecycle controller for every training mode: technique drills, sparring, free training, and performance tests. Every session begins with an explicit start request from the GUI and ends with an explicit end request that retrieves the session summary.
From the product point of view, this is the core training loop. The user configures a drill, starts it, receives live state feedback, and ends with a summary that supports the next training decision.
The sequence on a typical session is as follows. The user configures the session and taps Start. The GUI sends a request to the backend with the training mode, difficulty, and username, and receives a session ID in return. Throughout the session, state changes arrive from the backend: countdown, active round, rest period, and next round. The GUI uses these to drive the on-screen timer, round counter, and screen transitions. When the final round ends or the user exits early, the GUI sends an end request with the session ID. The backend returns a summary containing punch counts, accuracy scores, and performance metrics, which the results page then displays.
This separation means Section 5.3 (Robot Intelligence) handles scoring and data collection logic, while Section 5.1 (GUI) handles all presentation and user flow logic. The GUI does not need to know how punch scoring is calculated, and the backend does not need to know how results are displayed.
Phone Dashboard
A companion phone dashboard gives users a quick way to check progress and make light training adjustments without walking back to the robot. It is most useful between rounds or after a session, when the boxer wants a fast summary rather than a full touchscreen interaction. More detailed integration notes are in Section 5.3.
Gamification System
The rank and badge system gives users visible progress after a session, which helps keep training feeling rewarding instead of repetitive. The detailed scoring and data model are covered in Section 5.3.
Key Engineering Decisions
Navigation Stack
The application contains over 40 pages managed by a central QStackedWidget.
Rather than coding back button destinations manually on each page, a navigation stack
records the current page index before every transition. The diagram below illustrates
how the stack operates as a user navigates through a typical training flow.
Proficiency Assessment
On signup, new users complete a six-question checklist. Each question has three answer options scored 0, 1, or 2. The total score (0 to 12) maps the user to a proficiency level, with the option to override the suggestion before confirming.
| Question | Option 1 (0) | Option 2 (1) | Option 3 (2) |
|---|---|---|---|
| Have you trained boxing before? | Never | A few times | Regularly |
| Do you know the basic punches? | No | Somewhat | Yes |
| Can you throw a basic 1-2-3 combo? | No | With help | Yes |
| Have you done any sparring before? | Never | Once or twice | Yes, regularly |
| How would you describe your fitness level? | Low | Moderate | High |
| Have you used boxing equipment before? | Never | A few times | Regularly |
Table 5.1.2-3: Proficiency Assessment Questions
| Total Score | Suggested Level |
|---|---|
| 0 to 4 | Beginner |
| 5 to 8 | Intermediate |
| 9 to 12 | Advanced |
Table 5.1.2-4: Proficiency Assessment Scoring
The user can accept or override the suggestion on the result page before confirming. This classification determines the default combo difficulty tier shown on the training page.
Combo Curriculum and Mastery Algorithm
The curriculum contains 50 combinations: 15 Beginner, 20 Intermediate, and 15 Advanced. Each combo uses a notation system where numbers 1 through 6 represent punch types (Jab, Cross, Lead Hook, Rear Hook, Lead Uppercut, Rear Uppercut), with a "b" suffix for body shots and text labels for defensive movements (slip, block, roll). A combo is considered mastered when the user has completed at least five sessions with an average score of 3.0 out of 5.0 or above. This threshold-based progression model draws on mastery learning theory, which holds that learners should demonstrate competence at one level before advancing to the next (Bloom, 1984). Progress is tracked per-user in the SQLite database and persists across sessions. The self-select sequence builder interface is shown in Appendix 3.
Sparring Mode: Markov Chain Generation
Sparring mode generates punch sequences using a first-order Markov chain. A first-order Markov chain is a probabilistic sequence model in which the probability of each next state depends only on the current state (Norris, 1997). This property makes it suitable for real-time punch sequence generation within the Jetson Nano's compute budget, as no sequence history needs to be stored or evaluated. Each boxing style (Pressure Fighter, Counter Puncher, Infighter, Out-Boxer, Random) defines a transition probability matrix over punch types. Starting from a "start" state, the system picks the next punch based on weighted probabilities, continuing until an "end" state is reached or a safety limit of six punches per combo is hit. Sparring is available to all proficiency levels. The style selection interface is shown in Appendix 3.
Below is the transition matrix for the Counter Puncher style as an example. Each row shows the current state, and the columns show the probability of transitioning to each next punch (or ending the combo).
| From State | 1 (Jab) | 2 (Cross) | 3 (L.Hook) | 4 (R.Hook) | 5 (L.Upper) | 6 (R.Upper) | End |
|---|---|---|---|---|---|---|---|
| start | 0.30 | 0.30 | 0.40 | ||||
| 1 (Jab) | 0.40 | 0.60 | |||||
| 2 (Cross) | 0.30 | 0.20 | 0.50 | ||||
| 3 (L.Hook) | 0.30 | 0.70 | |||||
| 4 (R.Hook) | 0.20 | 0.80 | |||||
| 5 (L.Upper) | 0.10 | 0.90 | |||||
| 6 (R.Upper) | 1.00 |
Table 5.1.2-6: Counter Puncher Markov Transition Matrix
1. start: Roll probabilities. Jab (0.30), Cross (0.30), End (0.40). Suppose the roll picks Jab.
2. After Jab: Cross (0.40), End (0.60). Suppose the roll picks Cross.
3. After Cross: Lead Hook (0.30), Rear Hook (0.20), End (0.50). Suppose the roll picks End.
Result: the generated combo is 1-2 (Jab, Cross). The Counter Puncher style has high "end" probabilities, so it naturally produces short, precise combinations. Contrast this with the Pressure Fighter, whose low "end" probabilities generate longer, more aggressive sequences.
When a user's weakness profile is available (from previous sparring sessions), the transition weights are blended with a bias multiplier. The blending factor alpha increases gradually with session count (capped at 0.4), so the adaptation strengthens over time without fully overriding the style's character.
AI Coaching Integration
Users benefit from feedback that appears when it is most useful: quick prompts during training and a more reflective chat view after a session. That keeps the coaching feel light and readable while still giving the boxer a clear next step. More detail on the AI coaching pipeline is in Section 5.3.
Sound System
Audio feedback reinforces task events without requiring the user's visual attention, which is useful during active training when the user is focused on the pads rather than the screen (Gaver, 1989). The GUI uses 18 WAV sound effects, all preloaded at startup for zero-latency playback.
A priority tier system prevents lower-priority sounds from masking higher-priority ones. Round start and end bells carry the highest priority among session sounds. Countdown ticks sit below bells, hit confirmation sounds sit below ticks, and UI navigation clicks sit at the lowest priority. A navigation click cannot interrupt a bell, but a bell will always cut through regardless of what else is playing. Volume and per-sound toggles are available from the Settings page.
Feature Set
The complete feature set delivered across all seven iterations is summarised below.
References
- Bloom, B.S. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring. Educational Researcher, 13(6), 4-16.
- Gaver, W.W. (1989). The SonicFinder: An interface that uses auditory icons. Human-Computer Interaction, 4(1), 67-94.
- Norman, D.A. (2013). The Design of Everyday Things: Revised and Expanded Edition. Basic Books.
- Norris, J.R. (1997). Markov Chains. Cambridge University Press.
- Parhi, P., Karlson, A.K. and Myers, B.A. (2006). Target size study for one-handed thumb use on small touchscreen devices. In Proceedings of the 8th Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI 2006), pp. 203-210.
- Schmidt, R.A. and Lee, T.D. (2011). Motor Control and Learning: A Behavioral Emphasis (5th ed.). Human Kinetics.