AI is Changing the Usability of Enterprise IT

Generative AI is transforming enterprise IT into a more task-centric, personalized, and truly multimodal experience. As AI agents learn from users and their work environments, IT starts acting less like a tool and more like an assistant – working in the background, asynchronously, and in service of people.

Jouni Heikniemi 2.12.2025

FLORA-Fingerprint close up - kuvastaa turvallisuutta

A conversational interface, for example, a chat is attractive because it raises the status of a machine to the level of our sci‑fi fantasies: if you can converse with it in natural language about anything, the device inevitably feels like a smart partner. However, from the point of view of enterprise AI, chat is only a small part of the shift that AI brings to the relationship between employees and IT.

The first big AI question for enterprise architecture is: where does the AI actually live? Will an “AI layer” emerge around the organization’s data warehouse, will AI become part of ERPs, or will it fragment into all the little applications users pick up?

Most companies will likely end up with a mix of all these forms. From the user’s perspective, the change will feel demanding, because having AI merge into everyday work requires new skills, a new attitude towards IT, and also a new understanding of the human role. You can still teach someone to use a single application but what about when everything changes?

Task‑centricity shakes up the entire app culture

One of the greatest promises is that of enabling a truly company‑wide, task‑centric user experience. Task‑centricity means the user tells the machine what they want to do, rather than seeking out an application to do it. Everything starts with simple requests:

“Open last week’s sales report” finds the right view in HubSpot or the correct Power BI report.
“Create an expense report for last week’s business trip to Stockholm” recognizes the right calendar events, fetches the necessary receipts from email, and assembles the paperwork ready for expense‑report approval.
“Schedule 1:1 meetings with everyone from the Terva‑project steering group” knows who you mean and initiates the calendar scheduling.

Task‑centricity rests on two big building blocks. First, the user needs an AI interface directly on their device, where they can make requests. Such assistant‑like experiences are being sketched out today on Windows, iOS, and Android, even though none of them are quite mature yet. The bigger challenge, however, is the second missing piece: integrating those AI interfaces with enterprise IT requires some kind of agent catalog, through which the interfaces are interpreted. The catalog must answer questions such as how expense reports are done in the organization or where to find that sales report.

Ideally, the user wouldn’t even need to know which applications are in use; AI can use them all on their behalf. For example, the sales‑report view could simply appear on the screen without the actual reporting app e.g. Power BI ever being visible. In more complex data‑input situations, such as budgeting, the user’s sense of control and productivity still requires knowledge and know‑how of specialized applications. Even then, the AI’s role can support productivity, for instance, with a request like “Import the sales forecasts Antti sent by email yesterday into next year’s budget template.”

Personalization brings freshness to basic use cases

Using IT by simply making requests becomes more attractive the more those requests raise the abstraction level to working with a machine. Nobody needs a solution where you instruct the computer “Write the number 412 into cell G7”. Value emerges when an agent – aware of both the application environment and the user – can execute complex work steps in one go, leveraging data from multiple sources.

For example, the request “Register me for Microsoft’s Ignite conference” is already more interesting: the agent needs only the user’s wish, and can handle the task quite independently. Names, addresses, dietary preferences and such are likely already known by the agent. Some confirmation prompt will probably be needed, for example: “The registration fee is $2,325. Charge this to your credit card?”

But when an agent knows the user and their context, it could help even further. Once registration is done, a natural follow-up question could be: “Do you also want me to book flights and accommodation? Here are two hotel options we recommend. Last year you stayed at hotel X.”

You don’t need massive data sets to enable useful personalization. The agent can learn quickly from the user but even when there’s little direct data, it can be complemented with general corporate conventions, or, for example in the case of hotel recommendations, with public hotel reviews from the web. The strength of generative AI solutions lies precisely in this flexibility. With even modest personalization, the user experience begins to shift: IT no longer feels like filling out a form but interacting with an assistant.

Multimodality: The machine reaches out into real life

For a long time, human-to-IT interaction has been largely defined by screen, keyboard, mouse – and more recently, touchscreens. Yet many jobs rely heavily on information delivered through sight and sound. Generative AI drastically lowers the cost of voice and image communication, enabling a broadening of interface technologies in any application and it’s not just about voice control.

Take for example improving phone‑based customer service: the AI assistant could be more than just a chatbot aimed at the support agent. It could directly listen in on the customer call and build a real‑time case description that not only provides necessary customer record entries, but also offers the service agent real‑time context: customer history, possible solutions, next‑steps guidance.

Computer vision, in turn, can interpret the physical environment. For instance, when inspecting the condition of a home or office space, you could video the space and a large part of the issues could be automatically detected from the images. By speaking as you go, you could also generate task notes linked to visual observations, e.g. “This wall needs to be painted with our brand‑green.”

We are approaching a situation where producing and enriching visual material becomes so real-time and high quality that it supports actual work. A question like “How would this wall look if we painted it with our logo?”, relevant in space planning, is gradually becoming feasible.

What matters is considering IT usability through all human senses. In open-plan offices, using voice might be problematic, but there are uses in field work or anywhere where mobile devices are relevant. Free-form speech for capturing structured information is a good example: whether it’s writing a travel‑expense description, dictating a prescription, or reporting the condition of a cleaning target, it saves the user’s nerves letting them express thoughts naturally rather than fight forms.

Asynchronicity changes our sense of time

Modern work life demands a lot from human brains. One of AI’s biggest productivity promises is that it will let people focus on what really matters, as more and more routine tasks are automated. In this vision, AI must start to act more like an assistant than a tool. The human must stay in the driver’s seat, yet the scope and complexity of tasks handed to the assistant will grow as its abilities increase.

As tasks get more complex, their execution time grows longer too. You can expect speed from a machine, but not boundless speed: for example, “Find the cheapest place to print 500 T-shirts in Helsinki” may still be a multi‑minute research task even for AI. If a task involves human components (“Compare options for conducting a customer‑satisfaction survey”) or deliberate response delays (“Ask my direct reports to evaluate my leadership, then summarize the results”) the answer may not come until days or weeks later.

Asynchronicity means intentionally leaving work in the background, returning to it when it’s done or when human input is needed. Future AI solutions will lean heavily on asynchronicity: the user isn’t expected to sit in front of the agent interface, but the virtual colleague carries out tasks in the background and reports in when appropriate. Some notifications will come when tasks are complete, while others will be human‑in‑the‑loop prompts: “Is it OK if I plan the production‑line downtime for week 41?”

The change is demanding because knowledge workers already suffer from overload caused by various notifications. However, as agents mature, this shift can turn positive – unnecessary messages disappear, and material from an agent begins to feel as relevant as a message from the best colleague. But the bigger the agent army each person controls, the more notifications will inevitably arrive. Oversight and monitoring tools must evolve alongside agent‑driven work, so that people don’t become the bottleneck of progress.

What can I do right now?

Most of the agent technologies described in this article exist in theory, but in practice they are still so unfinished or difficult that implementing a real company‑wide solution is challenging. Yet you can start preparing for the coming agent future with a few small actions today.

Begin by getting ready for task‑centricity by cataloguing your company’s existing AI capabilities in a structured, standardized way, even if only in a Word document. Someday those might be easier to mobilize across the whole organization. The ideal would be to offer tailored functions from a centralized catalog to both desktop and mobile devices. Companies like Microsoft, Google and Apple often take years to converge on a common vision for user interfaces, so achieving unified extensions will certainly take more years. But smaller, more limited implementations can already be done with custom lightweight interfaces. It’s not necessary to commit everything to platforms like Copilot or Siri.

Furthermore, start combining speech and video with user interfaces today. There's plenty of support for getting started: multimodality is a big shift in how people work, and it often requires practice and experimentation. (E.g., “What kind of information should we try to capture from speech?” / “How good is the image quality we can realistically get of building structures?”) The collected material, speech or images, will often be useful if you eventually end up training or customizing your own AI model. Example data often needs thousands of samples, so it helps a lot if data collection has already begun.

In personalization, data quality becomes central: the easier it is to serve the user, the better the agent must know the user’s tasks and context. Structured data supports both high‑quality AI reasoning and a good user experience in the future.

Therefore, you can approach process development now with AI in mind: Which parts of each process could later be handed off to AI? Which parts could be enhanced by AI? In which situations should humans continue to act? Perhaps now with better information than before? It’s worthwhile to reflect on this also in enterprise architecture: a modular application structure creates not only flexibility for future AI development, but also allows the division of labor between humans and machines to be shifted flexibly in line with evolving organizational needs and technology.

If you want to dive deeper, download our Face of AI whitepaper or contact us!

Jouni Heikniemi

Jouni is a seasoned software professional and one of the 200 Microsoft Regional Directors worldwide.