Your first line gets 30 to 50 tickets a day, and it feels like two thirds of them are the same frustration — forgotten password, printer offline, missing access to folder X, Outlook won’t start. This page describes how you automate triage far enough that your team has time again for the tricky 20 percent — and the human still stays in the loop where it counts.
Do you have this situation?
- The person at first line spends 70 percent of their time doing the same thing — classify, assign, type a standard answer, escalate to second line or just resolve themselves. The same procedure, fifty times a day.
- Tickets land in the central mailbox or in the ticket system and stay there for four hours because nobody is triaging right now. The asker thinks nobody is taking care.
- There are three standard answers that get copied so often that by now they have six slightly different versions — each with its own typos.
- Urgent tickets get lost because they look optically the same as routine tickets. Management notices that “IT is overloaded”, although what really exists is a process problem.
- When a person from first line is sick or on holiday, the inbox visibly piles up — because no standby exists and the knowledge of what goes where isn’t documented.
Why solve this now instead of postponing
- The headcount grows, the IT team doesn’t. Mid-market IT teams are thinly staffed. Every additional person in the house generates tickets on average, but nobody hires an additional first-line role because of that. At some point the ratio tips.
- The good people quit when they only reset passwords anymore. First-line work is an entry-level role, not a life’s task. Whoever wants to keep good people there has to keep routine off their backs.
- Classification via LLM now works reliably enough to arrive in the Mittelstand. Three years ago this was experimental, today it’s operational standard technology — if you know its limits.
How it would look at your company
Step 1 — Analyse the ticket history (week 1–2)
We look at the last 6–12 months of your tickets: what’s the most common category, what are the most common resolutions, where does it snag between first and second line, what gets misclassified and leads to loops. From this emerges a map of which ticket types are automatable, which are semi-automatable, which have to stay human.
Stack: API of your ticket system (Jira Service Management, Zammad, OTRS/Znuny, Freshservice, ServiceNow or Microsoft Dynamics Customer Service — depending on what’s running at your company). Analysis in Python, result as a compact report for management and IT leadership.
Step 2 — Build the classifier (week 2–4)
We build a classifier that assigns an incoming ticket to a category (password, hardware, software, permission, other), estimates an urgency (standard, urgent, critical) and makes a proposal as to whom it should go. The classifier runs on a language model with your own category schema — not with a generic off-the-shelf “IT classes” schema.
Stack: Azure OpenAI in your Azure tenant, classification prompt with few-shot examples from your real tickets, optional fine-tuning via evaluation runs.
Step 3 — Auto-assignment and first-response proposal (week 4–6)
For clear categories the ticket gets automatically assigned to the responsible group, and a proposed first response is generated — based on existing solution articles or similar, already resolved tickets. Important: in the first weeks the first response does NOT go out automatically. A human sees the proposal and presses send or corrects. Human in the loop is standard, not an exception.
Stack: webhook or trigger in your ticket system, processing via Azure Function or Power Automate, response-proposal generation via Azure OpenAI, optional connection to your internal knowledge base from the RAG search.
Step 4 — Escalation logic with judgement (week 5–7)
Tickets for which the classifier is unsure (low confidence) go back into the human inbox and are NOT automatically assigned. Better to manually correct one ticket than ten wrongly assigned ones that subsequently wander through all teams. We calibrate the confidence threshold so that wrongly automated cases are the exception.
Stack: confidence score from the classifier, fallback rule in the trigger, logging into a dashboard view for IT leadership.
Step 5 — Try it out, measure, expand (week 6–10)
We start with one category — for example password resets and standard access requests — and measure for four weeks: how many tickets were correctly pre-classified, how often did the first response have to be changed, how fast was the throughput. Only when the numbers are right do we extend to further categories. The goal is not “automate everything”, but “recognize the routines and have them handled cleanly”.
What you should look out for along the way
- Ask to see the human-in-the-loop concept before anyone proposes “fully automatic” reply-sending functions. An AI that communicates directly with end users without human approval during the rollout phase is a reputational risk. A poorly worded answer to an annoyed end user is worse than a slowly answered ticket.
- Clarify how misclassifications are handled. There is no classifier with 100 percent accuracy. What matters is that wrong assignments become visible quickly and that a correction loop emerges — not that wrong assignments slip under the carpet.
- Watch out for vendors selling you a ready-made “IT helpdesk AI” without having seen your ticket history. Your categories, your language, your standard solutions are specific. An off-the-shelf standard classifier rarely hits your reality well.
- Make sure the ticket system is even API-capable enough. If your system only works via mail or via a rigid web form and offers no webhooks or API triggers, the automation is more involved than expected. Sometimes the first step is not automation, but switching or updating the ticket system.
What realistically changes afterwards
- First line spends considerably less time sorting and typing and more time on the tickets that really need attention.
- End users get a faster first reaction — even when nobody is actively triaging right now, because the auto-note “Your ticket has been assigned to group X, expected handling by Y” goes out immediately.
- Urgent tickets are recognized as such and put up front, not buried in the order of arrival.
- Standard answers become consistent — no more six variants of the same text with different typos.
- IT leadership gets, for the first time, solid figures on what the most common ticket causes are — and thereby a basis for where it’s worth solving the problem at the root instead of answering tickets about it every time.
What you contribute
- Access: read access to the ticket history (anonymized where personal references are sensitive) and administrative access to the ticket system for trigger setup.
- Stakeholder time: the person who today does or leads first line — estimated 4–6 hours in the analysis phase, then 1–2 hours per week during the initial weeks. Without this knowledge the classifier becomes generic and thereby bad.
- Works council and data protection: AI-supported processing of employee tickets is relevant for codetermination and data protection. We provide the technical description, you bring it into your bodies.
- Willingness to consolidate standard answers. The AI can suggest answers, but it does so on the basis of what is already well worded at your company. If there are no good templates, the first fruit of the project is paradoxically a tidied-up knowledge base.
Risks & when it does NOT fit
- If your ticket volume is under 10 per day and the routine share is rather small. Then the investment doesn’t pay off, then a small improvement in templates or a clearer triage protocol is the better lever.
- If the ticket system is a black box and no API access is possible. Then first clarify the system, then automate.
- If the expectation is that AI replaces the entire first line. It does not replace it. It takes routine off it and makes the job more interesting. Whoever wants to save by cutting roles should be honest enough to say that at project start — and not afterwards claim “the AI decided it”.
- If the data protection framework for AI-supported processing of employee enquiries isn’t prepared at your company. That can be clarified, but not in two days — and not by IT alone.
How the conversation starts
30 minutes initial conversation, free of charge, by video or phone. What we clarify: which ticket system runs at your company, how many tickets per day, how is first line staffed, which topics feel like they repeat most often — and what is the current trigger (staff shortage, complaints about reaction time, growth)? From this picture it emerges whether a classifier project is the right step or whether something else first — for example tidying up the standard answers — would bring more.
Response to a request is remote immediately during service hours. Initial conversation typically set up within 3–5 working days — depending on what’s going on with me, honestly speaking in solo operation.
Frequently asked questions
What if the AI misclassifies a ticket? In the early weeks that will definitely happen. That’s why in the first weeks a human goes over the classifications, corrects obvious errors, and these corrections flow back into tuning. Only when the hit rate is stably above 90 percent for a category do we loosen the human-in-the-loop obligation for that category. Other categories remain under human supervision.
Will Microsoft or OpenAI then read along with our tickets? When using Azure OpenAI in your own Azure tenant, the Microsoft Enterprise terms apply: your data is not used for training models, it stays in the chosen Azure region. Tickets with particularly sensitive content (HR matters, health data) can be filtered or anonymized before the AI processing.
Do we need a new ticket system for this? No, if the existing one is API- or webhook-capable. Picture a 100-person company with Jira Service Management or Zammad — both can be connected without replacing the system. If, however, work is still done in a shared mailbox, that’s worth a separate pre-conversation: first ticket system, then triage automation.
How long until it shows up in daily work? In the first 4–6 weeks the changes are small and under observation — deliberately. Noticeable relief typically comes in the second to third month, once the classifier runs stably for the most common categories. Whoever promises “groundbreaking efficiency leaps” after just two weeks has either no human in the loop or no realistic understanding of the Mittelstand.