Welcome to ActiveAnalysis.net!
To get full access to all Commentaries, the Discussion Forums and the Question and Answer section, either login, create an account using OpenID or register for a new account.
I recently wrote a white paper for Vicorp, an independent tool vendor for IVR systems. The following is a section from the white paper which I think is very interesting.
Web design has changed dramatically over the past 10 years, driven by the constantly changing development and user environments. The HTML standard has evolved from a rigid structurally-based markup language to an extensible HTML and XML hybrid. The CSS standard helps separate presentation from content. Information, workflow and processes are assembled through connecting services together into applications that are presented to a browser. Browsers, like Internet Explorer and Firefox, are constantly evolving and introducing new features and functions. Websites which were initially no more than informational brochures have now become self-service terminals, allowing users to perform a wide variety of transactions from their desktops and mobile devices. No longer are websites slabs of text scattered with isolated GIFs and JPEGs, but now they support multimedia applications. From a back-end perspective, the barriers to entry for web application development have become much lower with the advent of GUI-based drag-and-drop tools, which greatly expedite development and deployment time.
The evolution of the web has played a pivotal role in the evolution of voice. Beginning in the 1990s standards such as HTML and XML and browsers set the stage for the web. This helped bring forth standards-based GUI drag-and-drop tools which provided developers with a quicker and simpler way to create websites. In the following years, developers expanded the use of these tools and created more sophisticated transactional websites (i.e. Amazon.com) which were more user-centric. And recently, web services have emerged in this space providing for greater interoperability and service oriented architecture (SOA) implementations.
Voice is building on innovations pioneered by the web. Voice-XML was created just a few years after XML and many Voice-XML IVR systems today are deployed in a web services framework. Proprietary languages and traditional IVR systems are losing sway in the market to standards-based languages and Voice-XML platforms. This coming year, 2008, will be the first in which the number of annual Voice-XML licenses shipped surpasses that of traditional, legacy-based IVR. The growing adoption of Voice-XML is indicative of the fundamental shift in investment philosophy across many businesses. Whether in the web or voice environment, companies fear getting stuck with managing monolithic, expensive solutions that other elements in the network must conform with, be coded to and be designed around. As a result, businesses are graduating beyond siloed technologies and are looking to build on application paradigms to move towards a common standardized web architecture that provides interoperability across disparate systems. This is the driving force behind the SOA movement in the enterprise. SOA provides companies with the means to treat certain business processes and the underlying IT infrastructure as standardized, secure reusable components to address changing organizational needs. Looking forward, web and voice technologies in the enterprise will be deployed in an SOA environment. It is important to note that although Voice-XML makes it simpler to develop speech applications by creating a standard application description language, platforms based on this open standard have been widely deployed for DTMF solutions as well. However, speech recognition technology is growing at a more rapid rate than that of DTMF, making inroads into the IVR landscape and sharpening focus on increasing automation rates, reducing costs and improving phone-based customer service. From a development perspective, proprietary tools are being phased out, as are legacy systems; new tools informed by the feel and presentation of GUI-based, drag and drop web development tools are gaining traction in the voice channel.
While the details of web and voice interface design are as different as music and painting, their core mission is identical. The success of both resides in the ‘presentation layer,’ which represents the harmonious relationship between input – whether through keystrokes or utterances – and output through browsers. The HTML browser displays images, sounds and text to users, just as the Voice-XML browser presents audio and text-to-speech (TTS) to callers over the phone. Both are key customer touchpoints, which must be optimized for cost as well as customer service. Therefore the web offers voice a valuable example, both architecturally and structurally. In the earlier years of web design, corporate websites were used mainly as a broadcasting medium where the terms ‘viewers’ and ‘audience’ were frequently used to describe what we refer to today as ‘users’. Since the mid 1990s, corporate websites have become more sophisticated, evolving from static to transactional, inflexible to extensible, textual to visual and audible. Today, corporate websites provide two-way interaction with customers and exist to provide users with rich self-service capabilities. User adoption is high and continues to grow year over year as more consumers turn to web self-service. Corporate websites could not have reached this level of sophistication without the introduction and subsequent evolution of standards, and of intuitive development environments.The similarities of voice and web were not always so clear. Initially, presentation layer in the voice channel was extremely restricted; despite intense pressure to reduce live agent costs, DTMF simply did not offer a sufficiently flexible interface. Speech represented the most economical solution to reducing costs while increasing self-service sophistication in the voice channel. However, even with the advent of commercially viable speech recognition technology in the late 1990s, businesses were still constrained by inflexible proprietary environments of traditional IVR systems. Application development and call flow creation required extensive knowledge of vendor-specific proprietary programming and scripting languages. Therefore, businesses were heavily dependent on vendor professional services making the TCO of a speech solution very high and unattractive. But with the advent and adoption of Voice-XML, the environment changed. Businesses now have an open standard and similar flexible underlying architecture as that of the web providing for application reuse and portability in some cases, long-term investment protection and intelligent transactions that did not require hordes of expensive professional services engineers. Because Voice-XML IVR solutions resemble distributed web applications it truly leverages existing web architecture. Voice and web applications can be parked on the same server. These applications can then access and assemble services through common web APIs, apply business logic to them and present the appropriate data to either an HTML or Voice-XML browser. Moreover business rules, data access and code assets are aggregated within the application server (i.e. Websphere). As a result, Voice-XML provides for IVR interactions to be determined from a consolidated application infrastructure which provides for more intelligent interactions with personalization capabilities over the phone at a lower cost.
The parallels between web and voice are now becoming instructive. Businesses can introduce new levels of sophistication in the voice channel by fully leveraging Voice-XML, web architecture and speech recognition and developing well-designed applications that help drive enthusiastic adoption by customers. Businesses that want to achieve this while keeping their costs under control should consider a long-term tooling strategy for the voice channel.