Oh dear! I'm queued! It's latency!

Tee Williams
Tee Williams Associates

Oh dear, I’m queued, its latency[1],
Oh dear, I’m queued, its latency,
My trades just never get done.

Children’s songs can be fun and endearing, but they seldom reflect deep thought or profound ideas. Many of the current ideas about latency and its reduction that are prevalent in the trade press and in conferences are neither deep nor profound; they also suffer because they are not fun or endearing. We need to hear a different tune.

Introduction

About a decade ago the majority of trading was manual. Orders were ‘worked’ using phone calls and people on exchange floors. Now traders are actually concerned that their ability to execute trades is limited by the speed of light. As a result of this transformation, people are beginning to think about the elements of the trading process and how those elements can be streamlined with the goal of reducing latency. Before going further, it would be useful to have a couple of definitions:

Latency is the cumulative delay in the transmission of a message between two points. A combination of transmission distance and the myriad processing steps along the way accumulate to create latency. Put simply, latency is the inevitable result of physics.

Queuing is the backup of messages during the transmission process that occurs when the number of messages exceeds the capacity of the network and the processing functions along the way. Put equally simply, queuing is the inevitable result of poor design or massive underestimates of possible volume.

These concepts are important in markets where the bulk of trading has become automated. This automation encompasses both the execution of transactions and, with increasing frequency, the generation of the orders that precipitate those transactions.

Unfortunately, much of what initially sounds like sophisticated analysis of latency is as callow as our little song. The naïve statements usually take the form: ‘If your <network or process or application> has latency of < x milliseconds or even y microseconds>, you’re toast!’ One quote in an industry publication stated:

“For enterprise customers, the advantages that come from a performance boost can be dramatic. Goldman Sachs, for example, notes that every millisecond gained in its program trading applications is worth $100 million a year.”

Anon, Automated Trader, January 2007

The result of these statements and the furor they create has become what many industry observers refer to as a latency-lowering ‘arms race’.

We will look at this arms race, statements like the quote above and other comments that are being made about latency with the goal of separating the salient from the silly. One of the conclusions of this investigation will be that, while latency is critically important to the securities markets, most of the statements being made are as absurd as the faux lyrics to the children’s song that began this article. We will also see that the probable end of the pursuit of low latency will be a declaration of ‘No mas!’ If past technology arms races are a guide, at some point, probably not very long from now, someone will do a quick calculation on the back of a napkin in Harry’s Bar and realise that the possible ‘real returns’ from saved milliseconds are several orders of magnitude less than the ‘real investment’ required to achieve them. At that point we will all stand down until the next technology race.

Ironically, I believe that all of the hype has masked the very real problem of latency. I believe that the inexorable growth of message traffic over the last decade is overwhelming the systems used by those who need to execute trades quickly and effectively. However, the furor created by those trying to sell hardware, software and networks has diverted attention away from problems that need to be solved, and toward products and services that want to be sold.

Trading in the time of physics

“It is inevitable: the talk of ‘arms races’ always reminds me of unrequited sales.”

With apologies to: Love in the Time of Cholera, Gabriel Garcia Marquez

The allusion to an arms race, the fortieth anniversary of the crescendo of what has come to be called the ‘Back Office Crisis’, and the ill considered quote above on the value of a millisecond all lead us to consider the true importance of latency that is often missed in the hype of the trade shows. What is really important, and what is easily overlooked, is the fact that we have reached a point where the speed of light is an important factor in order routing. As we begin to think of decision making in that time scale, many of the issues we have ignored for years become critical. These overlooked issues are the foundation of this article.

The back office crisis and latency

As I write, it has been announced that Bear Stearns is being sold to J.P. Morgan because of liquidity problems that forced the Federal Reserve to step in with an investment bank – a rare event – and more ripples may yet emanate from the current problems. However, forty years ago right now the entirety of Wall Street was on the brink of collapse because the speed of trading had overwhelmed the infrastructure that supported the markets. One of the most important retail firms Goodbody and Company, was forced to merge into Merrill Lynch, and a significant investment bank, Hayden Stone, was weakened and ultimately merged into Shearson & Company, now a part of Citigroup.

The conditions that brought about the crisis was a growth in trading volume so that in 1967, shares traded on the NYSE reached more than 20m shares each day. Those volumes seem puny today, but the systems in the late 1960s could not handle the volume. In fact the systems were for the most part paper documents routed through manual, bureaucratic processes. As paper overwhelmed the capacity of firms to process the paper, the market was closed on Wednesdays to catch up and the trading day was shortened. Ironically, the growth in trading from what became known as the ‘go-go’ years caused more problems than the money the growing volume produced could fix. Equally ironically, computer systems that everyone could clearly see were needed in order to solve the problem actually added to the chaos. Firms like Lehman Brothers struggled not only to handle the trading volume, but also to maintain both a manual system and a computer system running in parallel as they tried to convert from one to the other.

For market data during the late 1960s the over-the-counter (OTC) market, which has slowly evolved into the Nasdaq exchange, was an example of true latency. A company known as the National Quotation Bureau (predecessor of the Pink Sheets) distributed computer punch cards at the end of the trading day to major OTC dealers. Dealers would write their bids and offers on the cards, which would then be collected and processed over night. The collected quotes were then printed for OTC issues and were circulated the next morning. By my reckoning, this amounts to a latency of 17 hours 30 minutes at best, because the quotes did not update again until the following day. This can be seen in Figure 1.

Figure 1: The OTC quote reporting process in the late 1960s

The more serious lesson of the back office crisis is that when systems designed for old business conditions run up against a changed environment, the slowdown in the process – latency –can have disastrous implications for market participants.

Two other points are worth mentioning about the period of the late 1960s. First, during that time there was a very real arms race between the West and the Soviet Union. That arms race had the ultimate impact of bankrupting the Soviets, just as the current race may well divert needed resources from market participants who could better use their assets for other purposes. The second fact is that the securities industry, out of the rubble of the crisis, constructed all of the critical infrastructure elements that make the modern markets work. To list a few:

Continuous net settlement
Immobilised securities
The Depository Trust Company
The National Securities Clearing Corporation
Negotiated commissions
The concept of a National Securities Market in the US.

While these effects were felt first and most directly in the US in the middle 1970s, they were quickly copied in the recommendations of the Wilson Committee Report in 1979 that resulted in ‘Big Bang’ in London in 1986, and moved on to most of the other successful markets worldwide.

So what can we really say about latency

I have poked a little fun at what seems to be a silly view of latency; so what really matters? Ironically, twenty years after the back-office crisis, and twenty years ago this past October, the markets came to understand what latency, or more accurately queuing (remember the distinction), can truly mean.

In the early hours of the day of the crash, 19 October 1987, traders tried, as they typically would, to submit orders to the NYSE and other auction markets (mostly to sell) at the price they saw on their trading screens. These orders, known as marketable limits, were priced at the market and should have resulted in a trade at the standing market quote. However, what traders did not know was that the volume of market data had overwhelmed the capacity of the systems that were processing the data. As a result, market data systems were queuing as messages attempted to pass through systems that were inadequate for the volume. The result was insidious. Computer screens seemed fine. Prices continued to update and quote and news tickers continued to scroll. The problem was, the prices displayed were actually several minutes old.

As the day proceeded, traders became increasingly panicked, and began to send market orders rather than marketable limits. The result was that the price for the day was driven several hundred points (roughly 25%) below the stable price that was achieved the next day. The loss, many billions of dollars for those who sold at the bottom, was in large measure the result of queuing in the systems.

So latency does matter. What then is the view of latency that I deride?

A naive view of latency

The view of latency most commonly discussed seems to me to be very naïve. The assumption of this view is that there is only one place to send an order, and the goal is to route the order by the most direct route, to minimise the physical distance between order creation and execution, and to make sure the transaction process is streamlined. Figure 2 shows this standard argument.

Figure 2: The naïve assumption of latency

This discussion quickly devolves into issues of co-location, processor speeds and network capacity. These discussions lend themselves easily to vendors hawking ‘stuff.’ When sales people smell blood-in-the-water, the odds against rational discussion are long.

What we need is a more reasonable understanding of what latency means, and how we can address the impact.

A more realistic set of assumptions

First we need to understand when latency matters. Professor Larry Harris has written one of the best and most approachable books on trading entitled Trading and Exchanges: Market Microstructure for Practitioners. Prof Harris defines some 38 different types of trader:

Figure 3: Trading types according to Dr. Harris

While Figure 3 helps to describe trading styles, for purposes of this paper I am much more interested in the types of order than the types of trader – in essence the motivation behind an order.

One way to think about the types of trader that Dr Harris has described is to think of each trader type as a distribution of the types of order he or she typically generates. Some orders are urgent because a quick execution is more important than the price. Other orders reflect the desire to participate in a security or instrument where the long-term prospect for the security – classic investment motivation – is the key objective. For investment motivation, fast execution is less important than a reasonable price and low transaction costs. Still other orders are based on a belief that the trader has important information about some fact likely to abruptly change the value of a security. In this situation the key is a fast execution, but at a price that benefits from the information. There are many other motivations, perhaps nearly as many as Dr. Harris defines for trader types.

The first thing to understand is that for many, perhaps even most types of order motivation, speed is not the most critical factor. For many types of order, price is the key. Therefore much of the discussion related to latency is less important than making sure the best price is achieved.

I would like to make one aside at this point. In both Europe and the US there is growing focus on ‘best execution’. As surely as there are many motivations to trade, there should be an equal number of definitions of what constitutes a ‘good’ or ‘the best’ execution. Nevertheless, the absence of a clear definition of best execution has resulted in some justifiable concern among firms that believe it is important to have access to views of alternative execution venues with as little latency as possible. The need to have good information as a result of the pressure for best execution is not a naïve view of latency, even if the views of best execution that have created the pressure are themselves callow.

Components of latency

Returning to our primary goal of defining a more sophisticated view of latency, I would like to define four different components of latency.

Input latency

As seen in Figure 4, Input latency is the time required for a firm to receive notification of an event – a trade, a new quote, or some piece of news. Input latency is a function of the capacity of the systems and the distance from the source to the point of the notification.

Figure 4: Input latency

Decision latency

Decision latency is the length of time required to act on knowledge of the event once the information is received. For traders sitting at display screens, it is their reaction time. For an automated system it is the time required to process the input information and formulate a decision. This is shown in Figure 5.

Figure 5: Decision latency

Processing latency

Processing latency, if any, is the time required to take the output of the decision process, and generate a reaction, typically an order. For a trader this means inputting an order into a trading terminal, or in older environments writing an order and passing it to an entry clerk. In an automated trading environment the decision making system may generate the order, or the decision and processing functions may be so closely linked that the distinction has little meaning. This is shown in Figure 6.

Figure 6: Processing latency

Implementation latency

Finally, Implementation latency is the time required to cause the reaction to the market event to take effect. In the best of all worlds this is simply the time required to send an order to market. Unfortunately, as markets rapidly become more frenetic, orders sent to the market often cannot be executed at the price placed on the order. The initiator receives information back from the market that the intended price did not execute and the trader must place an order on the book that is executable rather than the anticipated execution. This constitutes a new market event, and the trader or the system now needs to decide what to do with the new order. This frequently results in a cancel/replace order removing the unexecuted order and replacing it with a new order at a different price. The process of implementing a decision can be the most time consuming element of latency if the initial order does not execute as planned. Coincidentally this also helps explain why more than 90% of orders to exchanges and other trading venues do not execute and are cancelled. This is represented in Figure 7.

Figure 7: Implementation latency

The four components of latency are straightforward if there is only one place to send an order. You move the decision process as close to the market centre as possible, and buy extremely fast processors and high bandwidth connections. But what if the market is fragmented?

If there are two or more places to execute an order, and if you are under best execution obligations, or if you are trying to profit from the trading process itself (rather than making a long-term investment), then optimising latency becomes very difficult. If you put the decision/processing functions close to the execution venue, then the question is which one? If you have two decision/processing locations (one at each trading venue), how do you coordinate the two? When the number of trading venues increases further, the problem becomes truly complex.

I have tried to show the problem in the grid in Figure 8.

Figure 8: Optimising four components of latency

Figure 8 demonstrates that it is not possible to optimise latency absolutely. If you minimise some of the components of latency the other components become suboptimal. The best choices are highly dependent on the mix of orders a firm generates and the structure of the firm’s trading organisation.

“Too much of a good thing can be wonderful.” Mae West

I am sure someone will quickly say: ‘But this is a short-run problem. Exchanges are quickly consolidating.’ While exchanges are concentrating, the number of trading venues is certainly staying the same and may be growing. Every time two exchanges merge, their former members generally either invest in a new trading venue or create one. Moreover, MiFID has highlighted the fact that most major financial firms provide internalised markets that compete with exchanges for many users. Finally, the emergence of so-called ‘dark pools’, which are generally used for trading larger orders, creates an even larger number of trading destinations. A recent estimate counted nearly forty of these alternative trading systems. This number will decline, but not to a single dark pool. So for any reasonably active security, the problem of multiple trading locations is very real and unlikely to go away.

A more reasoned discussion of latency

So here are the relevant aspects of latency:

Latency is an inevitable function of physics and architecture.
Much of the current discussion about latency is the result of a sales-driven, contrived arms race that is likely to peter out when market participants discover they are paying more on simplistic solutions than they can ever recover in marginal trading returns.
Latency does matter, and queuing, which is related to latency, can have a devastating impact on trading.
The problems of latency do not lend themselves to a simple solution, certainly nothing that can be purchased ‘off the shelf’.

What does make sense, is a careful, firm-by-firm review of the places where latency can occur. This needs to be followed by an equally careful analysis of the types of order each firm typically generates, or each trading venue receives. Finally, each organisation needs to have a comprehensive, continuing capacity planning process. We should review each of these briefly.

The places where latency can occur

Kirsti Suutri of Reuters has provided a very good list of potential areas where latency can occur. Her list is shown in Table 1.

Table 1: Sources of latency

Source	Definition
Communications
Serialisation	Delay caused by clocking data into any circuit
Distance	Laws of physics – rule of thumb is 1ms per 100kms
Switching	Routers, number of hops – typically microseconds
Queuing	Incorrectly sized bandwidth for message bursts
Computer processing
Processor	Raw CPU speed
Devices	Bus, I/O devices, memory, disk
Platform	Operating system overhead
Application	Software architecture
Design & integration complexity
Normalisation	Translation, verification and symbology
Native interoperability	Data loaders

I anticipate that the review I am suggesting will reveal countless instances of latency in existing systems that cause more problems than vendors’ low-latency solutions can solve. During the arms race between the Soviet Union and the West, people trying to go between the two would queue at the Brandenburg Gate in Berlin at a crossing known as Checkpoint Charlie. I suspect that the review of internal systems will review multiple ‘chokepoint Charlies’ that are impeding information flow. While less glamorous than hyper-low latency solutions, correcting these bottlenecks is more likely to have a more important impact on trading profits. By the way, the evaluation of chokepoints should not stop at the boundary of your firm. You are part of a larger distribution system and you need to think of entire distribution networks, not just your own operations.

The mix of orders

The next step is to perform a careful analysis of the mix of orders a firm generates, or an exchange or trading venue receives. Those orders that do not require very fast execution, and are not subject to concerns over best execution, can be satisfied with reasonable latency from normal processing. For those orders that require low latency, you need to perform a cost/benefit analysis to determine if the value of lower latency exceeds the cost of implementing a very-low latency solution. I fully expect that many firms will discover that it may be more practical to change trading styles rather than fund expensive implementations.

If there are needs for low latency and the costs can be justified, then coequal to the technology architecture has to be a decision architecture that balances the speed of decision centralisation with the speed of technology decentralisation. This will not involve a one-size-fits-all conclusion. It will be different for every firm, and may be different for different departments within the same firm.

Capacity planning

A critical element is that both trading firms and trading venues need to have a credible capacity plan. From just after the back office crisis, the New York Stock Exchange has done a creditable job of planning for capacity. I can remember the 100,000-share per day initiative in the late 1970s, and every few years thereafter new initiatives were undertaken as old targets were exceeded. There have been problems, but in fairness, the idea of having ample capacity is instilled in the mindset (and the bonus incentives) of everyone who is responsible for technology at the Exchange. (The same can probably be said of most other market centers as well.)

In order to have ample capacity, firms need to plan wisely. How this is done will vary, but we constructed a capacity planning methodology for the US equities and options exchanges by developing a three part process. Part one is focused on normal growth that can be extrapolated from history. Each year usage grows based on new customers and products in a way that can typically be forecast based on past history. Part two considers growth resulting from exogenous factors that can be anticipated. Examples include changes brought about by both MiFID and RegNMS. Both sets of regulation were widely anticipated and it was possible to estimate the capacity impact from each with some careful analysis. The final part of capacity planning is for unanticipated events. These items cannot be forecast and are solved by building what we hope is enough excess capacity over the forecast requirement to handle unanticipated events. The three components are shown in Figure 9:

Figure 9: A capacity planning methodology

Parting thoughts

The current focus on low latency is having a very negative impact on the industry. Besides wasting money on ‘hyper-low latency’ solutions to problems of questionable validity, we are diverting attention from exploding message traffic and other very real problems. As in current and past arms races, the sexy and superfluous toys are causing us to lose sight of critical decisions on, and investment in architecture and infrastructure. If reason is to triumph we have to move past the hype and focus on real problems. Perhaps US President Dwight Eisenhower said it best in his farewell address:

“In the councils of government, we must guard against the acquisition of unwarranted influence, whether sought or unsought, by the military-industrial complex. The potential for the disastrous rise of misplaced power exists and will persist.”

Dwight D Eisenhower in his Farewell Address to the Nation on 17 January 1961

Notes

[1] To the tune of ‘Oh dear, what can the matter be?’