
You are here: MyURC.org > publications > Universal Control Hub & Task-Based User Interfaces
Universal Control Hub & Task-Based User Interfaces
Gottfried Zimmermann, Gregg Vanderheiden, University of Wisconsin-Madison, USA
Charles Rich, Mitsubishi Electric Research Laboratories, USA
Jan. 2006
Abstract
This white paper proposes the “Universal Control Hub (UCH) architecture” as an approach for implementing the Universal Remote Console framework in the context of a UPnP network. In this context networked devices can be discovered and remotely controlled through their UPnP implementation, and user interface clients can discover remote user interfaces served by Remote UI Servers according to the UPnP Remote UI specification. The UCH architecture lines up well with the upcoming CEA-2014 standard (currently being developed by CEA R7 wg9), and adds the ability for 3rd-party user interfaces and possible use of intelligent agents for task-based and/or natural-language user interfaces.
Note on the use of UPnP in the proposed architecture: Although tailored to the UPnP environment, the Universal Control Hub architecture is applicable to other networking platforms. Any of the following functions (which are assumed to be UPnP-specific in this white paper) could also be implemented in a different way: UI client discovering the UCH; UCH discovering the (controlled) devices; and UCH controlling the devices.
Introduction
The Universal Plug and Play forum has developed specifications for device classes, so-called “Device Control Protocols” (DCPs). Each DCP defines a common (machine-level) interface for a class of UPnP devices with embedded services in terms of mandatory and optional actions and state variables. Control points that have foreknowledge of the DCP that a device is using, can thus easily make use of its interface.
However, a DCP doesn’t tell anything about how a control point should present a UPnP device and its services to a human user. The UPnP specifications explicitly stay away from defining or adopting a framework for user interfaces. The presentation page URL which is an optional part of a UPnP device description is a hint to the fact that there needs to be some common mechanism of specifying a remote user interface for a UPnP device. However, the meaning of this presentation page is unclear and must rely on proprietary solutions, and is therefore hardly used in current UPnP implementations.
The Universal Remote Console (URC) framework, specified as a family of ANSI standards (ANSI INCITS 389-2005 through 393-2005), can fill the user interface gap in UPnP. It defines a “Protocol to Facilitate Operation of Information and Electronic Products through Remote and Alternative Interfaces and Intelligent Agents”. Its purpose is to define a user interface layer on top of any existing interoperability framework for device discovery, control and eventing. It does so by defining a family of XML-based languages for the specification of cross-device user interfaces. The most prominent component of the URC framework is the “User Interface Socket”, a common semantic model for all user interfaces that can be used for a device. High-level, polished user interfaces can be “plugged” into the socket, thus reusing application-specific code contained in the User Interface Socket.
This white paper describes how the Universal Remote Console framework can be combined with the UPnP DCP approach, with the “Universal Control Hub” being the core component of the proposed architecture. In a nutshell, the Universal Control Hub (UCH) allows for both thin UPnP devices and thin user interface clients. It provides the UPnP architecture and its DCP specifications with a common user interface layer, thus allowing UPnP devices to project their user interfaces on remote clients that they have no knowledge of.
1. Proposed Architecture

Figure 1: Universal Control Hub Architecture
In the center of the proposed architecture (see figure 1) is the Universal Control Hub (UCH) which acts as a gateway between a user interface client (“UI client”) and any UPnP device that it wants to access and control. The UCH talks the UPnP-defined DCPs for communicating with the devices. For example, it interacts with a UPnP enabled TV or DVR through the AV Device Control Protocol; and it interacts with a UPnP thermostat through the HVAC DCP. As mentioned previously, these UPnP DCPs do not include any mechanism for specifying user interfaces or the remoting of them (the RemoteUI DCP is an exception – see later). Thus a manufacturer of a device can deploy very simple devices that don’t know anything about user interfaces, and solely rely on being controlled through their corresponding DCPs.
In figure 1, there is a fourth device called “other device”. This is an illustration for the fact that we could accommodate a heterogenous set of devices to be controlled. The “other device” could be a networked device implementing any other networking protocol stack, including IEEE 1394 or Echelon LonWorks.
On the other end, the UI clients don’t know anything about the DCPs to be used for controlling the devices. Some of them may use UPnP RemoteUI to discover the UCH and to pick the remoting protocol of their choice, thus acting as RemoteUI client (or at least as control point to the RemoteUI server). Others may know how to initiate a specific remoting protocol with the UCH by some other means (e.g. through setup). In any case, a UI client finds a remoting protocol on the UCH through which it can remotely access and control any one of the devices.
Figure 1 lists some remoting protocols as examples:
- The Scalable Vector Graphics (SVG) format of the W3C offers a scalable (vector-based) format for rendering interactive graphics. It may be conveyed over HTTP (including the back-channel for control commands). In general, SVG has somewhat extensive computing demands on the rendering device, but there is also a light SVG version (“SVG Tiny”) for smaller rendering devices.
- Dynamic HTML. HTML is the prevalent user interface standard in the World Wide Web, often combined with scripting languages such as JavaScript (hence “Dynamic HTML” or DHTML). This kind of user interface is well-known and many tools exist for the development of DHTML user interfaces. In this example, we assume that the DHTML remoting protocol runs on HTTP for UI serving and (possibly) as a back-channel for communication from the UI client to the UCH. (Side note: The upcoming CEA-2014 standard, developed by the Consumer Electronics Association, will include a specification of a specific DHTML profile, called “CE-HTML”.)
- Macromedia Flash is a widely used framework for rendering dynamic content over the Web. Flash Remoting, a remoting protocol from Macromedia built on top of Flash, allows for a distributed architecture in which a thin UI client binds to a remote application server. However, for the user interface developer this remote binding is transparent.
- XRT2 is a light-weight binary remoting protocol. With XRT2 even ultra-thin devices with limited graphical capabilities can be used as UI clients.
- It is possible to remotely control devices through audio (including voice) only, over a phone line. A user interface serving this remoting protocol could be specified in the W3C’s VoiceXML standard language. Of course, this requires that the UCH have a phone link built in.
- Basically any user interface description (standardized or proprietary) may be used as a remoting protocol. The architecture does not impose any restrictions in this regard. As an additional example (not contained in figure 1) a “native Pocket PC” protocol could convey binary code for the rendering of a graphical user interface on a Pocket PC based PDA or smart phone.
- The last remoting protocol (“URC/HTTP”) is based on the Universal Remote Console framework and provides direct access to a User Interface Socket. This can be used by UI clients as a fall-back option in cases when no tailored user interface is available. More importantly, this protocol can be used by intelligent agents to get direct access to the functionality of a set of devices without the need for screen-scraping or similar approaches. We will come back to this protocol later since it is important for advanced user interfaces of the future.
Remoting protocols are in most cases offered to UI clients by URI. (The phone line is an exception.) The protocol URI determines the remoting protocol which may include a session identifier or other information about the state of a control interaction. For example, the URI “http://192.168.1.1/svg” may serve a portal-style SVG interface that lets the user pick a device from a list of available back-end devices. For a UI client that has already picked a protocol and back-end device through the UPnP RemoteUI procedure, the URI “http://192.168.1.1/svg/dvr” may immediately provide an SVG interface for the DVR.
One should not think of a protocol as delivering one static version of a user interface. Server-side adaptation mechanisms may be built into the remoting protocol that may facilitate delivering user interfaces that are adapted to the UI client’s properties such as screen size and user input capabilities. For example, when a UI client requests a user interface over HTTP, the HTTP header may bear information about the device’s and the user’s preferences. Also, some user interface descriptions allow client-side adaptations such as scaling and reformatting.
Some UI clients are only capable of using one remoting protocol; others could use either one of multiple protocols. For example, a desktop or laptop computer can easily use the following remote UI protocols: SVG on HTTP, DHTML on HTTP and Flash Remoting. A PDA can use DHTML on HTTP, and Flash Remoting. A TV set with a remote control can use DHTML on HTTP or Flash Remoting, depending on its software capabilities. A cell phone could use either one of the Flash Remoting, XRT2, or VoiceXML on phone line. And a plain old telephone could use voice-based VoiceXML user interfaces for remote control.
So far we have looked at the UCH as a “black box” which somehow bridges between user interface protocols on the UI client-side and UPnP DCP based protocols on the back-end. But how does the UCH generate the user interface descriptions for serving the RUI protocols? Does it use some pre-defined documents that are hard-coded into such a device? The answer is “YES” and “NO”. “YES” because there is some part of a user interface (the “User Interface Socket”) that is pre-defined for any DCP-standardized UPnP device. “NO” since the manufacturers of the (controlled) devices have a great interest in being able to project “their user interfaces” (bearing their corporate identity) onto the UI clients.
The User Interface Socket is the part of the remotable user interface that doesn’t change whatever remoting protocol is used to convey the user interface. It is the common semantic model of all user interfaces for a specific device, and is defined by its manufacturer. This includes all types of user interfaces with any output modality (visual, auditory, tactile, or any combination) and any input modality (keyboard, mouse, touch-based, stylus, hand-writing, gesture, etc., or any combination). A User Interface Socket contains a flat set of semantic user interface elements (called “socket elements”) that provide a synchronized communication protocol to the controlled device and its current state. UI Sockets add a logical layer on top of the DCP based constructs, that is closer to the actual user interfaces than the UPnP DCP constructs are. It is easier for user interface developers to bind their widgets to UI Socket elements than to DCP actions and state variables of the device. Socket elements are either variables, commands or user notifications. The description of the UI Socket (the “Socket Description”) also specifies how socket elements depend on each other, for example that the “volume” variable can only be modified is “mute” is off. More advanced dependencies can also be described, through the notion of pre- and post-conditions.
For today’s UI client devices, the User Interface Socket would not provide enough information for constructing a nice-looking user interface. What’s missing are concrete instructions how to build the user interface, what widgets to use and how to arrange and structure them. Also, labels need to be provided for the UI Socket elements. Widgets, structure and layout is provided by a “Pluggable User Interface”, a protocol-specific user interface description that plugs into a particular User Interface Socket. In general, a manufacturer will provide for each of its products a User Interface Socket plus a set of Pluggable User Interfaces for the most common UI client types, and deploy them to a Resource Server. The Resource Server may be company-owned or provided by any other organization such as a consortium. Other parties may create complementary Pluggable User Interfaces and make them available through the same or other Resource Servers. A Universal Control Hub that encounters a particular device will look for Pluggable User Interfaces for that device, searching on any Resource Server on the Internet.
At this point it is important that there be a defined procedure for the UCH what Pluggable User Interface to use if there are multiple available. The UCH is part of an implicit contract between the devices and the UI clients. The agreement is that if the manufacturer of a devices provides a Pluggable User Interface for a specific remoting protocol, this Pluggable UI is the default user interface to be rendered on the UI client when using that protocol. Only if there is no user interface available from the manufacturer of the device, or if for some reason it is not usable by the UI client or its user, user interfaces from other parties may replace the default one. For example, if the user understands only Japanese, but the manufacturer of the device provides only European language user interfaces, a Japanese user interface that was created by a third party for that device may fill in.
An obvious question is now: What if the user comes with a UI client which neither the device manufacturer nor any 3rd party has provided a Pluggable User Interface for? This may happen if a new UI client is released to the market, or for UI clients that are used by a small portion of the users only. In this case we need a fall-back option that makes a “functional user interface” (term borrowed from W3C Device Independence group) available on the UI Client. A functional user interface makes the full functionality of a device available to the user, though typically not optimized with regard to user experience. In other words, it doesn’t come with bells and whistles. It may be text-only, and requiring a lot of paging on smaller devices. Such a functional user interface can be created from the UI Socket and its description, without the need for a pre-defined Pluggable UI. By having direct access to the UI Socket through the URC/HTTP protocol, a UI client can generate a functional user interface on the fly.
Another question is whether there can be multiple UCHs in a home network. The answer is yes, since UPnP Remote UI allows for multiple Remote UI servers that a Remote UI client can each discover and connect to.
2. Sample Scenarios
To illustrate how this all works together, let‘s look at some example scenarios.
2.1 Computer Controlling TV

Figure 2: Sample scenario - Computer controlling TV
In the first scenario (figure 2), a user wants to use their desktop computer (as UI client) to control the TV in the living room which are both connected to the home network. The Universal Control Hub advertises itself as a UPnP RemoteUI server device. Since the computer is UPnP aware, it discovers the UCH and interacts with it through the RemoteUI DCP. Thus the computer finds out that the UCH provides a DHTML/HTTP based remoting protocol for controlling the TV. By following the corresponding URI for the DHTML/HTTP protocol, the computer opens a DHTML/HTTP based controlling session on the UCH for TV control. The UCH is using an DHTML/HTTP Pluggable User Interface for this session that it retrieved from the TV manufacturer’s Resource Server on the Internet.
Once a Pluggable UI is downloaded from a Resource Server, it may be cached on the UCH. However, it would be reasonable for the UCH to look for updates every now and then.
2.2 Same TV controlled by cell phone using Flash

Figure 3: Sample scenario - Same TV controlled by cell phone using Flash
The next scenario (figure 3) has the same TV being controlled by a cell phone instead of the computer. The cell phone cannot render DHTML, but finds a remoting protocol for Flash Remoting clients. Since it has a Flash Lite player installed, it follows the corresponding URI. From the initial list of back-end devices that the UCH projects onto the cell phone’s screen, the user picks the TV. The UCH retrieves and installs the Pluggable User Interface from the TV manufacturer for the Flash Remoting protocol, if not already installed. Then the TV’s flash user interface is rendered on the cell phone, as defined by the TV manufacturer.
Instead of starting a new session with the cell phone controlling the TV, the same session could have migrated from the TV to the cell phone by some kind of session URI manipulation. For example, if the URI “http://192.168.1.1/html/tv?session=xyz” denotes a DHTML/HTTP based control session to the TV, the URI “http://192.168.1.1/flashremoting/tv?session=xyz” could denote the same session but served through the Flash Remoting protocol. When migrating a session from one protocol to another, the Pluggable User Interface would be replaced but the User Interface Socket would remain.
2.3 Same phone controlling thermostat over phone line

Figure 4: Sample scenario - Same phone controlling thermostat, using VoiceXML
In the third scenario ( figure 4), a user wants to set the temperature of the thermostat at home while driving home. Because she is driving, she cannot look at the cell phone’s screen. Instead she dials the UCH’s private phone number and hears: “Here is the Universal Control Hub. Say one of these: TV, DVR, thermostat.” She says “Thermostat” and hears “Thermostat selected”. She: “Set temperature to 68 degrees.” The UCH responds: “Thermostat set to 68 degrees”. The user hangs up.
In this scenario the UCH retrieves a Pluggable User Interface for the VoiceXML protocol from the thermostat manufacturer’s Resource Server, and binds it to the User Interface Socket for the UPnP enabled thermostat. A VoiceXML interpreter (with phone line connection) is acting as UI client.
2.4 Aggregated Flash UI for TV and DVR, showing on TV

Figure 5: Sample scenario - Aggregated Flash UI for TV and DVR, showing on TV
This scenario (figure 5) illustrates how aggregated (compound) Pluggable User Interfaces may be used to project a single user interface comprising functions of multiple controlled devices. Here a TV is used to render a Flash Remoting based user interface for both the TV and the DVR. For example, this user interface could contain the volume slider for the TV and the channel selection list for the DVR right next to each other.
The UCH finds an Flash Remoting based aggregated Pluggable User Interface for the TV and the DVR in the home network. In its list of available remoting protocols announced through the UPnP RemoteUI server service, it can now offer a URI for a “TV+DVR” user interface session based on Flash Remoting. The user can pick the session using the remote control of the TV, and thus have the aggregated user interface rendered on the TV screen.
2.5 Any UI client without tailored UI controlling DVR

Figure 6: Any UI client without tailored UI controlling DVR
In this scenario (figure 6) a user wants to control the DVR with a „peculiar“ UI client which no Pluggable UI is available for. So none of the „traditional“ UI protocols work for this UI client. This may be a new client device with unusual screen size or a new user interface engine, or a client device that is used by a minority of users only. So the only protocol it can use is the URC/HTTP protocol (as a fall-back option) which provides a „raw access“ mode to the UI Socket.
By connecting to the URC/HTTP protocol, the UI client can build a „functional user interface“ on the fly. This involves parsing the XML documents that describe the DVR‘s UI Socket and its resources such as labels, but it offers greater flexibility in how to render the user interface. For example, a UI client with only little screen size may render a text-only user interface, while a voice browser may choose to render it with speech output, key navigation and voice input. In any case, the resulting functional user interface lets the user do everything they could do with any one of the Pluggable UIs, but it typically comes with a poorer user interface in terms of layout and things that make the user experience rich and appealing.
2.6 Natural language and task-based user interfaces

Figure 7: Natural language and task-based user interfaces
The last scenario (figure 7) is important for a smooth transition and migration toward more advanced user interfaces such as provided by intelligent agents. One of the challenges we have with the traditional user interfaces is that they are inherently device-centric, i.e. showing the functions of one device at a time. The user needs to permanently switch between devices if they want to harness a collection of connected devices, for example in the home theater. Some universal remote controls make up for this by programmable macros that can involve multiple devices. However, there is still a large gap between those programmable universal remotes and the ability to combine the functionality including state information of multiple devices in one user interface.
A new type of UI client, let‘s call it an „intelligent agent“, can provide a user interface in terms of goals and tasks rather than devices. Through this user interface the user has instant access to any function and combination of functions of the connected devices, without the need for walking menus or following links to different pages. In addition, intelligent agents may provide a natural-language based user interface to converse with the user in a more natural manner. This may (or may not) involve voice output and/or voice input – the implementation of the intelligent agent is out of scope for the UCH architecture.
How does an intelligent agent know about the tasks that one can accomplish with the set of connected devices? In an ideal world, with unlimited machine understanding of the devices and their UI Sockets, a really intelligent agent could infer possible tasks and their relationships. However, this is not achievable today and will likely not be for the next decade. So we are proposing that the agent gets its knowledge about relevant tasks and goals through another resource, called „Task Model Description“ (TMD). A TMD can be created by the manufacturer of a device, or as a joint operation involving all manufacturers of the devices. However, it is more likely that there will be third parties specializing in the creation of TMDs, and provide TMDs for combinations of devices according to their device class specification. We need a standardized description format for TMDs because TMDs may be created, used and re-used across multiple companies, involving the device manufacturer, TMD specialists and intelligent agent providers.
So the intelligent agent will search for a suitable Task Model Description, and, in our scenario, finds one on the Resource Server of a 3rd party. It downloads the TMD and generates a task-based user interface based on the TMD. In general, there are many different ways the agent may use this information in its dialog with the user. This is solely up to the agent, and we have no desire to standardize the user interface itself that the agent will generate. However, in any case, the agent will receive status information and send control requests on the connected devices through the URC/HTTP protocol on the UCH. There is no other protocol that could provide the semantic level of status and control information necessary to drive the intelligent agent.
Please note that the Task Model Descriptions that we are proposing are currently not part of the URC framework and the pertinent ANSI standards. We are proposing to standardize TMDs as an extension to the URC framework.
3. How the Proposed Architecture Adds Value to UPnP
By proposing a Universal Control Hub as middleware layer between the DCP based devices and the remoting protocol based UI clients, we identify the following added values:
- The UCH provides a solution for the problem of having to serve multiple clients (cross-client solution). Through its remoting protocols it offers a set of diverse user interfaces that are tailored for specific UI client devices. Should no tailored user interface be available for a particular UI client, it can still connect to the URC/HTTP protocol as a fall-back option and thus generate a “functional user interface” on the fly. The URC/HTTP protocol can always be provided for any controllable device since it only needs the UI Socket which is also needed for any Pluggable UI.
- The UCH provides an open platform for Pluggable User Interfaces. This brings about the following features:
- The manufacturers of (controlled) devices can project their user interfaces onto UI client devices. The UCH acts as a broker of remotable user interfaces between the device and a UI client. Neither the UI client nor the UCH need to be made by the same manufacturer.
- A user interface can include functions of more than one device, thus making manual switching between UIs unnecessary.
- Easy internationalization (i18n) since Pluggable User Interfaces are easily provided as duplicates in different languages. Also, by outsourcing, third parties can translate Pluggable User Interfaces and post them to a Resource Server.
- Simplified programming model for user interface designers designing for complex UPnP devices. Some UPnP DCPs are very complex and push the limits of what the UPnP architecture can achieve. For example, the AVTransport service template (part of the AV DCP) defines an evented state variable “LastChange” that provides a summary of other state variables’ value changes in the form of an XML document. Therefore a user interface designer would have to write XML parsing code to be able to trigger user interface updates based on a back-end device’s state change. The User Interface Socket can free the user interface designer from having to deal with XML parsing. Instead the socket layer provides a flat set of variables, commands and user notifications that the UI designer can bind its interface to. APIs for the User Interface Socket layer will exist for common user interface description languages.
- The User Interface Socket model provides an open platform for task-based user interfaces and intelligent agents. It is expected that task-based user interfaces and intelligent agents will provide an answer to the simplicity challenge of the digital home. The Socket Description declares how individual UI Socket elements depend on each other. UI Socket elements are suitable for forming the leaf nodes of a task-model tree, with their parent nodes being tasks of various aggregation levels. Task-model trees could be published for device classes or combinations of device classes by the vendors of these devices, or by any third party. In addition, an intelligent agent can provide a natural language user interface based on the UI Socket, with possible extensions of the Socket Description toward knowledge modeling and semantic Web technologies. By introducing the basic model of a User Interface Socket today, we benefit in multiple ways today and are also ready for the user interface technologies of tomorrow.
4. Relation to existing CEA standards
The proposed UCH architecture fits nicely in the CEA (Consumer Electronics Association) standards arena.
First, the UCH can be designed to be a control proxy for UPnP devices that don‘t come with a remote user interface themselves (kind of a thin server). The UCH would then act as RUI Server of level 1 or 2 according to the CEA-2014 draft. It would offer a set of UI applications (protocols), including the mandatory CE-HTML protocol, and the URC protocol for fall-back and intelligent agents. Beyond these two protocols, any protocol providing a specific UI application may be provided by the UCH.
Second, the UCH can bridge between devices that are implemented on different networking platforms. For example, it may serve CEA-2014 compliant user interfaces for devices based on UPnP and Echelon LonWorks, but CEA-2027 user interfaces for devices based on IEEE 1394.
We see the following benefits of using UCH architecture on top of CEA-2014 and CEA-2027:
- Open platform for user interfaces. The device manufacturer can define the constraints for all user interfaces based on the UI Socket, and 3rd parties can contribute within the given constraints. For example, a device manufacturer may want to have their logo displayed. They can include the logo in the UI Socket and mark it as „sensitive“, so that any Pluggable UI must render it in an appropriate form. (This means also, that the device manufacturer has to provide their logo in multiple forms, e.g. different resolutions and in audible form as well.) With the UI Socket approach, internationalization is easy.
- Extensibility for more advanced user interfaces. This architecture provides for a smooth migration to future user interfaces that can solve the complexity problem for the user. One user interface may include functions of multiple devices from different manufacturers, without the user being required to know what each one does or does not. Combined and simplified user interfaces can be contributed by 3rd parties, if the device manufacturer does not have the resources to provide them. This will result in improved usability and therefore higher market share for the products that implement this technology.
5. Task Modeling
In a nutshell, the goals of task-based user interfaces is to increase ease-of-use by:
- not needing printed manuals,
- and enabling the customer to use the full product feature set.
By using task models (which can be derived from hierarchical Task Model Descriptions) we allow for more flexible dialog approaches. Thus an intelligent agent may better assist a user in what they want to do. This may involve more or less guidance, depending on the user‘s needs. Typically the agent would only intervene if it seems that the user needs guidance or is even requesting help from the agent.
Intelligent task-based agents have (at least) the following three key functions, all of which are supported by explicitly modeling the preconditions and postconditions of actions.
- Task Planning. The agent uses the pre/postconditions to "simulate" the effect of actions without actually executing them. For example, it is often convenient to allow a user to specify (e.g., via natural language) a desired state of the world (target) and let the agent figure out what action or sequence of actions is required to achieve that state, given the current state. The typical way to do this, often called "backward chaining" or "means-end analysis", involves symbolic reasoning about pre- and postconditions, rather than evaluating them.
- Task Execution Monitoring. The agent evaluates pre/postconditions during the execution of actions in order to improve the quality and reliability of execution. For example, if the sufficient postconditions (see below) of an action are true before the action is started, there is no need to perform the action. Similarly, if any postcondition of an action is false after the action is executed, then the action failed. Furthermore, compared to just the 'error' status, using postconditions, the agent can better characterize the error to the user and perhaps suggest a repair strategy (using general algorithms).
- Task Explanation. The pre- and postconditions constitute machine-readable documentation of the actions, which the agent can use to provide context-specific, on-the-fly explanation to the user.
Note that each of these key functions may or may not involve natural language. (Sidenote: Whether or not voice output and/or voice input is used, is yet another aspect).
As an extension to the URC framework, we are proposing to add a standard on Task Model Descriptions. Hereby the task model tree would be based on the UI Socket, i.e. its leaves would be elements of one or multiple UI Sockets. Thus the Task Model Description would be independent from any networking platform specifics, and 3rd parties could contribute to them. Existing task definitions (in the form of task model trees or subtrees) could be re-used across vendors, and even across networking platforms for similar devices (device classes).
5.1 Task Model Description

Figure 8: Task Model Description
A Task Model Description
- decomposes top-level goals into intermediate-level goals and ultimately into primitive device actions, thus representing a hierarchical structure (tree);
- defines steps for achieving a goal which may be (partially) ordered;
- has input and output parameters for each of its goals and subgoals;
- defines constraints on parameters and/or between them;
- and defines pre- and post-conditions for its goals and subgoals.
5.2 Task-Based User Interface

Figure 9: Task-Based User Interface
One could argue that task-based user interfaces can be achieved with UI Sockets only, without the need for Task Model Descriptions. This is only true to a certain degree, which is illustrated by the table above.
- UI Sockets facilitate the interaction in terms of tasks and goals in a somewhat „hard-coded“ fashion. However, with Task Model Descriptions the user interface can be authored, modified and extended more easily.
- In the same vein, the UI Socket concept supports the idea of cross-device tasks by providing a Pluggable UI that binds to multiple UI Sockets. However, Task Model Descriptions allow for cross-device operation in a way that is more general and easier to author than the „hard-coded way“. In particular, TMDs make it easier to adapt to a dynamic set of available devices.
- When it comes to the user defining new tasks (and refining existing tasks), this needs to be done with TMDs (UI Sockets are not sufficient).
- Also, real collaborative interaction can only be achieved with TMDs, not with UI Sockets alone. Collaborative interaction may involve mixed initiative dialog, task guidance, plan recognition and context-sensitive explanation.
Appendix: Glossary of Main Components
User Interface Socket
- Based on ANSI INCITS 390-2005, a User Interface Socket is a functional user interface for a controlled device or service that can be rendered in any input and output modalities.
- It contains elements (variables, constants, commands or user notifications) which provide status information and input capabilities for a controlled device.
- The User Interface Socket provides a common model for pluggable user interfaces that can be used to access a controlled device. For user interface designers it hides the complexity of UPnP and its DCPs, providing a simplified model (API) that they can bind their user interface objects to.
- The User Interface Socket is the basis for an open platform for pluggable user interfaces on top of UPnP.
Pluggable User Interface
- A user interface description or implementation that binds to the elements of one or more User Interface Sockets.
- A Pluggable User Interface may be specified in any programming language or user interface description language, including HTML, SVG, Flash, Java code or any other binary code.
- However, the manufacturer of a controlled device must provide at least one Pluggable User Interface that works well for a wide range of controller devices. This particular Pluggable User Interface is provided as HTML code that conforms to a "device-independent" profile.
- When programming a Pluggable User Interface, its elements (or widgets) should access User Interface Socket elements for getting and setting their values, or for command invocation (this is called "binding").
Task Model Description
A hierarchical structure (tree) of goals and subgoals that
- decomposes top-level goals into intermediate-level goals and ultimately into primitive device actions;
- defines steps for achieving a goal which may be (partially) ordered;
- has input and output parameters for each of its goals and subgoals;
- defines constraints on parameters and/or between them;
- and defines pre- and post-conditions for its goals and subgoals.
Universal Control Hub
- The Universal Control Hub is a gateway between UPnP devices (controlled devices) that talk a DCP, and controller devices that talks different user interface protocols. This approach allows for light-weight controller devices and small-footprint controlled devices.
- The UCH allows a controller device to remotely access and control a UPnP device. It allows a UPnP device to project its remotable user interface on controller devices that it has no knowledge about.
- For discovery purposes, the UCH acts as RemoteUIServer device, letting the controller device pick whatever controlling protocol is suitable for it.
Resource Server
- A Resource Server provides Pluggable User Interfaces to Universal Control Hubs. Instead of building Pluggable User Interfaces into a product, its manufacturer deploys them to a Resource Server. Deployment and updating may occur even after the product has been shipped.
- Any party may deploy Pluggable User Interfaces to Resource Servers, but a UCH must prefer Pluggable User Interfaces from the manufacturer of a product to Pluggable User Interfaces from other sources, if suitable for a particular controller device and its user.
- For easy internationalization, a Resource Server may also provide Resource Sheets that contain labels, icons and help texts for a particular Pluggable User Interface.
- As a possible framework extension, a Resource Server may also provide Task Model Descriptions.