The whole point
This is a continuation of my previous posts about starting a new Android project and building its UI with Jetpack Compose.
The thing is, most apps can't be just a flashy interface. They have to do something when you push all those shiny buttons. My goal with NoteGrid was to build a proof-of-concept that could pave the way to a full-fledged MIDI Sequencer -- a music composition tool. I reasoned that doing so would teach me a thing or two about audio programming and let me gauge how feasible such a project might be. This post is about how I got NoteGrid to work with MIDI data and play any synthesized instrument packaged in a soundfont.
Here's what we'll go through in this post:
Audio playback on Android using the MediaPlayer and SoundPool APIs.
Why MIDI support would be more powerful than either of those.
What MIDI APIs are officially supported by Google on Android, and what we're missing.
Open-source solutions to the gaps in official APIs.
Implementing a proof-of-concept of real-time MIDI playback, editing, and file output.
Source code for the completed proof-of-concept app can be found here.
Digging through old code
I first built a version of NoteGrid a couple of years ago, when Compose was still in beta.
The grid itself is made of toggleable square surfaces. The x axis represents time, and the y axis represents pitch. In the video above, a simple scale pattern is programmed into the grid, but you can also have a bit of fun with this simple toy:
I've discussed building (and enhancing) the UI for this screen in a separate post. Clearly, it's a much simpler interface than a full sequencer or DAW would have. However, before I begin the task of building something more fully-featured, I'd like to be sure that I can cover all my use cases where audio playback is concerned.
In my old code, to play the audio when the user presses play and the app reaches an enabled note, I simply used android.media.SoundPool
- a core Android utility. SoundPool makes it very easy to play any sound from a collection of short clips, though it can't handle larger files like full songs. You can use it to trigger sound effects when a button is clicked or something happens in a game, for example.
I only used eight audio files, one for each note of a scale including the root's octave. I played each note in real time by responding to the same Animatable time
value I used to light up the notes on the grid as they were playing. When a note on the grid first lights up, NoteGridViewModel::playSound
is called which uses SoundPool to play the correct note.
SoundPool is very simple to use. Our entire SoundPoolPlaybackController class looks like this:
class SoundPoolPlaybackController @Inject constructor(@ApplicationContext context: Context) {
private val synthIds = listOf(
R.raw.triangle_synth_1_c,
R.raw.triangle_synth_2_d,
R.raw.triangle_synth_3_dsharp,
R.raw.triangle_synth_4_f,
R.raw.triangle_synth_5_g,
R.raw.triangle_synth_6_gsharp,
R.raw.triangle_synth_7_asharp,
R.raw.triangle_synth_8_highc,
)
private val soundPool: SoundPool = SoundPool.Builder()
.setMaxStreams(8)
.setAudioAttributes(
AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_MEDIA)
.build())
.build()
private val soundPoolIds = mutableListOf<Int>()
init {
for (id in synthIds) {
soundPoolIds.add(soundPool.load(context, id, 1))
}
}
fun play(scaleDegree: Int): Int {
return soundPool.play(soundPoolIds[scaleDegree], 1f, 1f, 1, 0, 1f)
}
fun release() {
soundPool.release()
}
}
The synthIds
correspond to files in the res/raw
folder, which I've named triangle_synth_#_[note].wav
numbered from 1-8 and including the name of the note contained in the file. All we have to do after building a SoundPool is load()
the raw resources, then play()
the soundPoolIds
returned from load()
. We also need to release()
the SoundPool when we're done with it. Developer documentation for SoundPool can be found here.
Why MIDI?
It's clear that compared to any sort of serious composition software, NoteGrid has enormous limitations. Just for example:
Only eight notes are available -- one scale, one octave, one instrument.
All notes are the same length.
The grid length is short and immutable.
We can't save and load sequences.
MIDI (Musical Instrument Digital Interface) is a standardized event protocol for sending and receiving music events (like that a note started, stopped, or was pitch-bent) in real time. For being able to support a large number of synthesized instruments, it's the obvious solution. So when I came back to this code thinking that I'd like to build on what I learned from it and create something more full-featured, the first question I had was whether or not I could use and support MIDI data on Android.
This question is worth asking because the Android OS has its own runtime, separate from the JVM used most commonly on other platforms such as desktop PCs. Some features of Java, and many Java SDK packages, are simply not available on Android. For example, UI components must be created through XML layouts or with Compose, not Spring. So when working in a new domain, we can't take for granted that certain tools we're used to using will exist on the platform.
So how hard is it?
I had some research to do. Software teams often call a task like this a "spike" after their use in rock climbing. Driving a spike into a cliff doesn't increase your altitude while you're doing it, but the work enables future climbing. Here the primary goal is to collect information and make decisions, rather than to build a production-ready feature. The questions I wanted to answer were, on Android specifically:
Can I play a MIDI file?
Can I read and write MIDI files?
Can I play MIDI data in real time using a free or built-in soundfont (collection of synthesizers)?
The first thing I learned was that Android's default MediaPlayer class can be used to simply play a .mid
file. Doing this was as simple as adding a .mid
file to the res/raw
folder and playing it with MediaPlayer the same way you would an mp3. Under the hood it clearly has access to a soundfont.
However, there isn't a way to stream MIDI data to MediaPlayer in real time. That means it doesn't fully serve our purposes for this application, as any time we wanted to play a section of music or something as small as a single note that the user just entered, we'd need to write out a file and tell MediaPlayer to play it. Constantly writing temporary files to disk feels like the wrong approach and is not likely to be terribly performant. We also wouldn't be able to change the soundfont used, as far as I can tell.
The next official Android package I looked at was android.media.midi
. At first this looked promising, as it claims to "allow users to … generate music dynamically from games or music creation apps." However, the package is focused on connecting to external MIDI devices and sending / receiving messages from them, rather than the use cases I'm most interested in, and there aren't abstractions for MIDI events -- you have to process raw byte arrays in order to implement a MidiReceiver. Nothing in there will parse a MIDI message and play it automatically using a built-in synthesizer. That said, there is some official sample code from Google showing how to build a synthesizer to act as such a device, which would be a pretty fun thing to experiment with some day! It's just not the problem I'm currently trying to solve. If I ever do add the ability to input notes or record tracks using an external MIDI device, though, this will be the library I use.
It's also worth noting that this API is available in the Android NDK, meaning it can be used from C++ code. Use of the NDK could very well be important for a project like this in order to reduce latency. Going down to the level of C++ would make a number of other libraries available, too, surely including playback functionality, but would also add complexity and scope to the project. It should definitely be kept in mind as an option, but for now I want to continue working in Kotlin if possible. Investigating the C++ MIDI programming ecosystem is a good candidate for future research.
Information about the Native MIDI API is found here.
Unofficial libraries
When I read about playing MIDI on other platforms with the JVM, it became clear that javax.sound.midi
is a standard API for working with it. This is one of those JavaSE APIs that doesn't exist out of the box on Android, but Kaoru Shoji has made an Android port of it available on GitHub. They also provide USB and Bluetooth MIDI drivers.
Another library I looked at is android-midi-lib. The description really says everything you need to know:
This project is mainly for use with Android applications that do not have access to Java's javax.sound.midi library. However, it is a stand-alone Java library with no Android-specific dependencies or considerations.
This code provides an interface to read, manipulate, and write MIDI files. "Playback" is supported as a real-time event dispatch system. This library does NOT include actual audio playback or device interfacing.
I tried out the sample code to write a MIDI file of ascending notes and was able to play it back with MediaPlayer, but this still didn't show me how to use a soundfont for real-time audio playback.
What we need is a software synthesizer
I was getting closer now, and more precise with my searches. What I came across next was a library called MidiDriver written by Sherlock Jiang. The description reads: "Just a synthesizer for playing MIDI note on Android … soundfont2 file is supported." It includes a sample app that plays a piano note or a wood block sound when the user taps a button.
Kaoru Shoji is thanked in the readme for their work on the Android port of javax.sound.midi
, the source code of which is included in MidiDriver along with that of sun.media.sound
and javax.sound.sampled
. sun.media.sound
contains the main class we'll be interacting with as a public interface to those libraries, SoftSynthesizer.
The project hasn't been updated since 2017, so the version of javax.sound.midi
is somewhat out of date. This also means that the sample app won't build in the latest version of Android Studio and work on recent devices without some modifications. Thankfully, Android Studio provides a number of prompts and utilities to automatically refactor old projects, and with a bit of effort I was able to get it running. I've uploaded a fork with those updates here: https://github.com/kaelambda/MidiDriverUpdate
This seems to be exactly what I'm looking for. There is noticeable latency between touching one of the buttons and hearing the sound, unfortunately, that wasn't there when using SoundPool. However, it's very slight, and could probably be compensated for by delaying UI animation slightly. So my next task is to use MidiDriver to play something more complex than a single piano note, and see how it handles doing so in real time.
One downside of these libraries is that they seem to be included directly in the project as source code, rather than available as a release that could be easily updated by changing a gradle dependency. However, being ports of relatively old, stable core Java SDKs, I'm hoping there won't be much need to update them frequently. Still, this is a sign I'm using open source software that's unlikely to be maintained, and if there are bugs in the port I may need to resolve them myself or find alternatives.
A complete proof-of-concept
By incorporating the MidiDriver library into my project, I was finally able to get MIDI playback and instrument selection working. The code to do so is hardly any more complicated than the use of SoundPool, simply replacing SoundPool with the SoftSynthesizer class and loading in a soundfont:
val synth = SoftSynthesizer()
val soundbank = SF2Soundbank(appContext.assets.open("SmallTimGM6mb.sf2"))
synth.open()
synth.loadAllInstruments(soundbank)
synth.channels[0].programChange(0)
SmallTimGM6mb.sf2
is the same simple soundfont bundled with MidiDriver that only contains two instruments (piano and wood block). It's the only one I'll be checking into the repo for now since other soundfonts can be much larger in size, but .sf2
files are easy to find online from sites like Musical Artifacts.
Once we've loaded our instruments, we can see all their names through synth.loadedInstruments
and select one to use with the programChange()
function. Then we can play a note like this:
fun play(scaleDegree: Int) {
val message = ShortMessage(ShortMessage.NOTE_ON, 0, getNote(scaleDegree), 127)
synth.receiver.send(message, -1)
}
message
is a standard MIDI message containing a status byte and one or two data bytes. Our status byte will contain NOTE_ON
and the channel number, the first data byte represents the note to play, and the second data byte is the velocity value which controls the note's volume. Since the NoteGrid is meant to map to the notes of a single scale (natural minor starting from middle C), the getNote()
function maps a note's position on the y axis of the grid to the correct note within the scale. When we send this message to our synth's receiver, we use a timeStamp value of -1 so that the NOTE_ON
event occurs immediately. A similar function sends a NOTE_OFF
event when a note ends, without which many instruments will play indefinitely.
After adding a dropdown for selecting an instrument from the soundfont and a switch to toggle between our SoundPool and MIDI implementations, we have a working proof-of-concept of MIDI playback on Android!
File output
As one final test, I wrote a function to write the contents of the grid out to a MIDI file:
private val midiFileWriter = StandardMidiFileWriter()
fun writeCompositionToMidiFile(noteGrid: Array<Array<Boolean>>, instrument: Int): File {
val sequence = Sequence(Sequence.PPQ, RESOLUTION)
val track = sequence.createTrack()
track.add(
MidiEvent(ShortMessage(ShortMessage.PROGRAM_CHANGE, instrument, 0), 0L)
)
for ((x, column) in noteGrid.withIndex()) {
val notePosition = EIGHTH_NOTE * x.toLong()
for ((y, enabled) in column.withIndex()) {
if (enabled) {
val onMessage = ShortMessage(ShortMessage.NOTE_ON, 0, getNote(7 - y), 127)
track.add(MidiEvent(onMessage, notePosition))
}
}
for ((y, enabled) in column.withIndex()) {
if (enabled) {
val offMessage = ShortMessage(ShortMessage.NOTE_OFF, 0, getNote(7 - y), 127)
track.add(MidiEvent(offMessage, notePosition + EIGHTH_NOTE))
}
}
}
track.add(
MidiEvent(ShortMessage(ShortMessage.STOP), EIGHTH_NOTE * xCount.toLong() + RESOLUTION)
)
midiFileWriter.write(sequence, 0, outputFile)
return outputFile
}
Here the StandardMidiFileWriter from jp.kshoji.javax.sound.midi
is used. StandardMidiFileWriter requires us to build a Sequence, which needs to have at least one Track, which we add MidiEvents to.
A MidiEvent contains a MidiMessage and a timestamp just like we pass to our synth's receiver for playback, but here we have to actually calculate the timestamps rather than just passing '-1' for immediate real-time playback.
It's important to realize that all the MidiEvents in a track need to be added sequentially with respect to their timestamps, which is why I have two separate for
loops for the y axis inside the loop for the x axis. If two notes play at the same time, then I add both their NOTE_ON
events before adding NOTE_OFF
events for them. Otherwise, by putting both events in the same loop I'd be adding a NOTE_ON
event at a given timestamp, then a NOTE_OFF
event for that note an eighth note farther ahead in time, then attempting to add a NOTE_ON
event for the second note back at the original timestamp. This results in the StandardMidiFileWriter going into an endless loop as it can't handle going backwards in time.
Placing a STOP
event a quarter note past the end of the grid's duration prevents StandardMidiFileWriter from automatically placing one immediately after the last NOTE_OFF
event, giving the last notes a bit of time to ring out before playback ends.
Here's a video of a short sequence being played first with the soundfont I've loaded into a SoftSynthesizer, then by MediaPlayer with the built-in soundfont on my device:
Since the two players are using different soundfonts, you can hear a big difference between the same 'program' (that is, selected instrument) in each. However, they still each follow the General MIDI instrument list, so they are simulating the same instrument.
What have we learned?
If we go back to my original definition of the "research spike" that inspired this post, I had three questions to answer:
Can I play a MIDI file?
Can I read and write MIDI files?
Can I play MIDI data in real time using a free or built-in soundfont (collection of synthesizers)?
Now that I've added these features to NoteGrid, I can definitively answer each of those with "Yes." More specifically, I've learned about a set of APIs that have been ported from the Java SDK to support MIDI playback on Android and have tested them enough to be confident they do what they say they will. This is good news, as it seems I have everything I need in order to build a more complex, production-ready application.
What's next?
NoteGrid is a fun toy and effective proof-of-concept, and it was a good programming exercise and learning experience, but it's not an app with much serious utility. What it most resembles is a drum machine, and with some extension it could act as one. If the grid's length and the notes it played were configurable, a bpm control were added, various synths were built in (drums as well as melodic instruments), soundfonts could be loaded dynamically, output to external MIDI devices was supported, and so on, then it could drive a backing track for a practicing musician or be used to play looping sequences into external synths as part of a band's live show.
Another thing NoteGrid is missing is polish of its technical aspects. For example, I completely ignored automated testing. This app was a proof-of-concept, so that's okay, but for a more serious application I should likely start fresh and spend more time on design and architecture up-front. Other best practices I didn't follow include building the UI to be accessible from the very beginning. Accessibility semantics would likely need to be customized for the editor to provide a good experience to users of TalkBack or Switch Access.
My original idea was to build a sequence editor with more features than a drum machine. Such an app would allow users to create multiple tracks, each assigned to a different instrument, and compose whole pieces of music of any length. I imagine myself using the app to experiment and write down ideas I have while traveling, then export them to .mid
and import them into a DAW or other composition software such as MuseScore or Guitar Pro once I'm back to a PC. Such a sequencer would also be able to function as a drum machine simply by allowing a piece to loop, so that could be a good use case to base early feature development on. I do want to be careful not to commit to too large of a project, so starting with simpler use cases is probably a good idea.
Since I ended up using classes from javax.sound
and com.sun.media.sound
to get MIDI synthesis working, a bit of homework for me is to read through the documentation of those packages more thoroughly, just as I've been going through the Jetpack Compose docs and taking notes. Otherwise, I could easily end up reimplementing functionality that already exists in what's been ported from the larger Java ecosystem. For example, an implementation of the Sequencer interface exists and includes playback and looping control, functions to mute and solo tracks, a way to set the tempo in BPM as well as a tempo factor, and start- and stopRecording functions. If I can properly leverage those libraries, most of what I'll be building is simply a UI to interact with them.
I'm quite happy that NoteGrid works as well as it does. I went a bit farther with it than I strictly needed to just to test out some libraries and play with Compose, but I'm glad that I did -- both so that I have a more polished demo to show here and because I learned something new every time I added a new feature or decided to fix a bug or performance issue.
To continue down this road, what I should do next is prepare for a more serious project in a more serious way:
Document the requirements for the app I want to build and create mockups of the UI.
Design the app's architecture -- diagram what modules, packages, and important classes I'll need to write.
Make decisions about construction practices such as automated testing.
Here's the full source code for NoteGrid again. If you made it this far, thanks for reading! I plan on continuing to learn about music programming and sharing what I find here.