top of page
ChatGPT Image Jan 5, 2026, 04_25_11 PM_edited.jpg

AI for Sounding Out Words 

AI for Sounding Out Words is an industry-collaborative applied AI project developed with SonicPhonics, focused on supporting early literacy and phonics learning for children aged 5–7.
The project explores how artificial intelligence and speech analysis can be used to help children sound out words correctly, improve pronunciation, and build confidence in early reading.

​

The work was carried out in close collaboration with industry clients and aligned with the New Zealand primary school curriculum, addressing a real educational need using applied AI techniques.

What This Project Demonstrates

​

  • Applied AI development using real-world children’s speech data

  • Speech data preparation and phoneme-level labelling for machine learning

  • Model evaluation under noisy, non-ideal data conditions

  • Translating AI research into a practical educational context

  • Team-based development with industry stakeholders and formal QA processes

new SP.png

01

Project Overview & Team Contributions

The project was delivered by a multidisciplinary student team working closely with SonicPhonics as industry clients to design, develop, and evaluate an AI-assisted phonics learning solution.

​

The system was designed to analyse children’s spoken phonics and provide feedback aligned with how early literacy is taught in New Zealand classrooms. A strong emphasis was placed on research, evaluation, and iterative improvement rather than producing a polished commercial product.

​

Key areas of team contribution included:

​

  • Speech data collection and labelling, using real children’s voice recordings to capture pronunciation patterns and common learning challenges

  • AI model research and development, exploring speech recognition and phoneme-level analysis techniques suitable for young learners

  • System design and prototyping, creating a child-friendly interaction flow to support guided phonics practice

  • Evaluation and iteration, testing model performance and refining approaches based on learning outcomes and client feedback

  • Documentation and reporting, producing technical reports, quality assurance artefacts, and project planning documentation to meet academic and industry expectations

​

This project combined artificial intelligence, educational technology, and human-centred design to address a real-world learning problem in an industry context.

02

My role

Within the team, I primarily contributed to the AI development and data preparation aspects of the project.

​

My work focused on preparing and labelling children’s speech data for phonics analysis, including identifying correct and incorrect pronunciations and structuring audio data for use in model training and evaluation. I supported the development and evaluation of AI models used to recognise and assess spoken phonemes, working with real, imperfect speech data rather than curated datasets.

​

In addition, I contributed to technical documentation and project reporting, helping translate research findings, system behaviour, and evaluation results into clear, structured documents for both academic assessment and industry stakeholders. Throughout the project, I collaborated closely with teammates and industry clients, contributing to discussions around system design decisions, evaluation criteria, and iterative improvements.

​

This role allowed me to apply AI concepts to a real educational problem while gaining hands-on experience with speech data, model evaluation, quality assurance processes, and industry-aligned development workflows.

Picture1.png
token.png

03

AI Pipeline, Artefacts & Results

This section presents the core AI pipeline and technical artefacts developed for the project, illustrating how children’s speech data was transformed into structured inputs for phoneme recognition and evaluation.

​

The workflow begins with labelled audio recordings of children pronouncing individual phonemes. These recordings were processed using MFCC (Mel-Frequency Cepstral Coefficient) feature extraction, converting raw audio signals into numerical representations suitable for machine learning.

​

To explore patterns in pronunciation and reduce variability in speech data, K-Means clustering was applied to the extracted features. Cluster assignments were then converted into token sequences, forming a structured representation of phoneme pronunciations.

​

These token sequences were used as inputs to a neural network model, which was trained to recognize and assess spoken phonemes, including both single and blended sounds. Model performance was evaluated using F1 scores across individual phonemes, allowing detailed analysis of strengths and limitations across different pronunciation categories.

​

The artefacts and results shown here reflect an applied AI pipeline, combining signal processing, unsupervised learning, supervised model training, and structured evaluation using real-world, imperfect speech data.

bottom of page