Multi-modal user interfaces allow users to interact with digital systems using touch, images, video, as well as real-time speech – to suit user's needs in a particular moment. In the AI era, this matters because it enables richer data input and real-time AI assistance for field work and many other business scenarios where traditional computer use is too difficult or ineffective.
Now, as AI-assisted development has become reality, we no longer have profound reasons to rely on often clumsy SaaS systems, which typically provide siloed AI capabilities. Building custom multi-modal solutions, integrated to master systems, has become increasingly attractive.
A field work scenario – illustrative example
An example of a multi-modal solution can be found from a construction site, where a user is examining damage. The user is walking outside with a mobile app guiding the process and providing visual guidance and instructions when needed. The app records any findings that are worth noting down:
-
It listens to the user's speech, answering with real-time voice, also turning spoken instructions into actions and checkmarks in the visual user interface.
-
Uploaded video and image material is automatically processed in the background. Once ready, analyzed outcomes are transformed into actions and checkmarks in the UX – like with voice.
-
Touch can be used where needed, as noise levels can sometimes prove disturbing for speech or audio.
Once the observation is ready, the app agent generates a summary, with suggested remedying actions. When approved, the agent proceeds ahead to submit email messages, create service requests, or whatever might be needed to finalize the job. And this is what agents should really be about: doing intelligent things and integrations based on what we tell and show them, right?No need for additional work at the office. In addition, the multi-modal solution helps collect rich data to support any future AI assisted analysis.
AI-assisted development is the enabler
Are multi-modal apps the future? No, this is doable right here and now. We have all used these features with Copilots, ChatGPT and other AI tools – we know that AI can speak and deal with images, even video.
So, how expensive is it to include such features in an app that we want to be customized particularly to our needs. Actually, not that expensive as we can develop faster with AI and modern voice technology.
With AI boosted real coding skills, business-critical custom solutions are now much faster to prototype, test, and yes, they can be taken to production in a reliable manner. Integrations and business process streamlining always takes time and money but in 2026 we are really seeing AI assisted development challenge monolithic legacy systems. AI boosts the speed of development, and modern AI assisted software is about clean APIs, modular code, and well-documented architectures – exactly the things many legacy systems lack.
In short, AI, as a development assistant, as well as a solution component, makes it possible to develop exactly the kinds of systems that our processes, employees and customers need to make the processes perform.
Considerations for successful outcomes
Firstly, with AI, it is more important than ever to understand the objectives for the change. Don’t forget that. And yes, it’s an excellent idea to exploit AI in finding ways to streamline the process and understand stakeholder needs.
Secondly, the UX of a multi-modal app will require design. AI helps boost the design process, but details are crucial in building a fluent “cross-modality” user experience that users can trust. For example, it is crucial to provide timely visual feedback for speech instructions and support easy recovery of errors. Yes, with AI and voice, there will be errors.
Thirdly, let's not forget that the AI processing of data must be transparent. Real-time speech adds new challenges - one must consider clear textual transcriptions and ways to recover from errors and faulty AI interpretations.
Finally, the point of this article was to suggest that in 2026, custom solutions are a tremendous opportunity for business process digitalization. With AI, they are markedly faster to create – and, when properly developed, easier to maintain. This is what AI now enables us to do. With a recent announcement, modern web apps can also be run inside Power Platform providing an excellent governance framework around newly developed custom solutions.
If you want to see a live demo of multi-modal power and discuss how to create modern custom solutions on your business-critical data, please get in touch or send us a message!