top of page
ChatGPT Image Jan 5, 2026, 04_25_11 PM_edited.jpg

AI for Sounding Out Words 

AI for Sounding Out Words is an industry-collaborative applied AI project developed with SonicPhonics, focused on supporting early literacy and phonics learning for children aged 5–7.
The project explores how artificial intelligence and speech analysis can be used to assess how children sound out words, provide phonics-aligned feedback, and support confidence in early reading.

​

The work was carried out in close collaboration with industry clients and aligned with the New Zealand primary school curriculum, addressing a real educational need using applied AI techniques.

What This Project Demonstrates

​

  • Applied AI development using real-world children’s speech data and team-implemented lightweight neural models

  • Speech data preparation and phoneme-level labelling for machine learning

  • Model evaluation using phoneme-level metrics under noisy, non-ideal speech conditions

  • Translating AI research into a practical educational context

  • Team-based development with industry stakeholders and formal QA processes

new SP.png

01

Project Overview & Team Contributions

The project was delivered by a multidisciplinary student team working closely with SonicPhonics as industry clients to design, develop, and evaluate an AI-assisted phonics learning solution.

​

The system was designed to analyse children’s spoken phonics and provide feedback aligned with how early literacy is taught in New Zealand classrooms. A strong emphasis was placed on research, evaluation, and iterative improvement rather than producing a polished commercial product.

​

Key areas of team contribution included:

​

  • Speech data collection and labelling, using real children’s voice recordings to capture pronunciation patterns and common learning challenges

  • AI model research and development, exploring speech processing and phoneme-level analysis techniques suitable for young learners

  • System design and prototyping, creating a child-friendly interaction flow to support guided phonics practice

  • Evaluation and iteration, testing model performance and refining approaches based on learning outcomes and client feedback

  • Documentation and reporting, producing technical reports, quality assurance artefacts, and project planning documentation to meet academic and industry expectations

​

This project combined artificial intelligence, educational technology, and human-centred design to address a real-world learning problem in an industry context.

02

My role

Within the team, I primarily contributed to the AI data preparation, tooling enablement, and evaluation aspects of the project.

​

My work focused on preparing and labelling children’s speech data for phonics analysis, including identifying correct and incorrect pronunciations and structuring audio data for use in model evaluation. I set up and adapted existing project code to enable phonics studio workflows for reliable audio labelling and review, working with real, imperfect speech data rather than curated datasets.

I worked closely with teammates responsible for model implementation, contributing to discussions around CNN and RNN-based approaches, evaluation criteria, and analysis of model behavior on real-world speech data. In addition, I contributed to technical documentation and project reporting, helping translate research findings, system behavior, and evaluation results into clear, structured documents for both academic assessment and industry stakeholders.

​

This role allowed me to apply AI concepts to a real educational problem while gaining hands-on experience with speech data, model evaluation, quality assurance processes, and industry-aligned development workflows.

Picture1.png
token.png

03

AI Pipeline, Artefacts & Results

This section presents the core AI pipeline and technical artefacts developed for the project, illustrating how children’s speech data was transformed into structured inputs for phoneme recognition and evaluation.

​

The workflow begins with labelled audio recordings of children pronouncing individual phonemes. These recordings were processed using MFCC (Mel-Frequency Cepstral Coefficient) feature extraction, converting raw audio signals into numerical representations suitable for machine learning. The resulting feature representations were used to train lightweight neural sequence models implemented in TensorFlow.

​

To explore patterns in pronunciation and reduce variability in speech data, K-Means clustering was applied to the extracted features. Cluster assignments were then converted into token sequences, forming a structured representation of phoneme pronunciations.

​

These token sequences were used as inputs to lightweight CNN and RNN-based sequence models, which was trained to recognize and assess spoken phonemes, including both single and blended sounds. Model performance was evaluated using F1 scores across individual phonemes, allowing detailed analysis of strengths and limitations across different pronunciation categories.

​

The artefacts and results shown here reflect an applied AI pipeline, combining signal processing, unsupervised learning, supervised model training, and structured evaluation using real-world, imperfect speech data.

The implementation described above reflects the team’s collective work. My individual contribution focused on data preparation, tooling enablement, evaluation, and documentation.

bottom of page