usb_latency_issues.html 6.29 KB
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
	<TITLE></TITLE>
	<META NAME="GENERATOR" CONTENT="StarOffice/5.1 (Linux)">
	<META NAME="CREATED" CONTENT="20020805;11531500">
	<META NAME="CHANGEDBY" CONTENT="bill saperstein">
	<META NAME="CHANGED" CONTENT="20020805;11582900">
</HEAD>
<BODY>
<P><BR><BR>
</P>
<P><FONT FACE="courier, monospace">I wanted to summarize the meeting
last Friday(8/2) where we discussed in greater details the
&quot;shortcomings&quot; of the InSilicon and ARC USB cores. In
addition, we discussed the &quot;environments&quot; in which the USB
would be utilized in the player. </FONT>
</P>
<P><FONT FACE="courier, monospace">First, Andy re-visited the
interrupt latency issue and with Doug's help, we calculated the
latency to enter and exit a standard interrupt handler. The
calculation showed that there is around 9.7usec overhead to get
into/out <BR>of the handler (operating at 62.5MHz). In addition, the
period between interrupts when transferring 64bytes over usb is
around 47 usec. </FONT>
</P>
<P><FONT FACE="courier, monospace">For the InSilicon solution, the
buffer is loaded during the interrupt routine. This takes around
2.5usec additional to load the buffer (causing a total of 12.2usec in
the handler). To solve this problem, would require additional <BR>64byte
buffers (additional endpoints) to allow the host driver to rotate
thru the endpoints and the firmware on the player could load several
buffers during one interrupt routine. The additional buffers&nbsp; is
basically a real estate <BR>issue. </FONT>
</P>
<P><FONT FACE="courier, monospace">For the ARC solution, the same
problem with interrupt latency occurs, although the 2.5usec additonal
latency is not present since the ARC core dma's the data into the
buffer w/o processor intervention. The solution of multiple
<BR>endpoints can also be used with the ARC core to solve the latency
problem. However, the ARC core does NOT require addition buffers
since the core dma's all the data into the latency buffer. </FONT>
</P>
<P><FONT FACE="courier, monospace">The previous discussion pertains
to the usb core in device mode. When the core is in host mode, there
appears to be NO way to overcome the interrupt latency issue. This is
mainly due to the fact that the processor always has to <BR>initiate
the token for each host transaction. There appears to be no way to
line up these tokens to let the core fire off several transactions in
a row. </FONT>
</P>
<P><FONT FACE="courier, monospace">Now, to see how these latency
issues affect the operation of the player during normal applications,
we can outline the normal operating environments for the player when
utilizing usb: </FONT>
</P>
<P><FONT FACE="courier, monospace"><U>Device Mode</U> (bulk transfers
only) <BR>&nbsp;&nbsp;  connection to depot <BR>&nbsp;&nbsp;&nbsp;
connection to PC </FONT>
</P>
<P><FONT FACE="courier, monospace">&nbsp;&nbsp;&nbsp; Both of these
connections will be transferring large amounts of data. Bandwidth
over usb important. The processor will only be running the browser
during these device mode connections. This situation is similar to
the one described above for large transfers in device mode. In the
case of InSilicon, we will see a 20-25% utilization of the CPU to
perform such transfers from the depot. We wouldn't want to add more
64byte buffers to the core it possible. In the case of ARC, we would
use the additional endpoints to allow the depot to rotate through the
endpoints and reduce CPU involvement. </FONT>
</P>
<P><FONT FACE="courier, monospace">&nbsp;&nbsp;&nbsp; Since the CPU
is not being heavily utilized during these two device mode
connections, it appears OK to use either core without a detrimental
affect. </FONT>
</P>
<P><FONT FACE="courier, monospace"><U>Host Mode</U> (the player CPU
may be heavily utilized in host mode connection, ie. playing a game)
<BR>&nbsp;&nbsp;&nbsp; modem - ISO transfer <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
For ISO transfers, the latency issue to can be reduced greatly by
allowing the packet size to increase to 1KB </FONT>
</P>
<P><FONT FACE="courier, monospace">&nbsp;&nbsp;&nbsp; e-net - bulk
transfer <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Both cores
suffer from interrupt latency every 64 bytes of transfer </FONT>
</P>
<P><FONT FACE="courier, monospace">&nbsp;&nbsp;&nbsp; printer - bulk
transfer <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Both cores
suffer from interrupt latency every 64 bytes of transfer </FONT>
</P>
<P><FONT FACE="courier, monospace">&nbsp;&nbsp;&nbsp; camera, card
reader - bulk transfer <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Both cores suffer from interrupt latency every 64 bytes of transfer </FONT>
</P>
<P><FONT FACE="courier, monospace">&nbsp;&nbsp;&nbsp; keyboard,
joystick - bulk transfer <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Transfers are small and infrequent to not present a problem </FONT>
</P>
<P><FONT FACE="courier, monospace">There is one more item that should
be discussed regarding&nbsp; the USB cores. That being ease of
integration and verification. The InSilicon design is strickly a
slave device on BVCI. It should be very straightforward to
incorporate this core into the bb_chip. A good place to attach this
core would be to the MI peripheral bus (similar to a ROM). This core
requires minimum glue logic to possibly translate control signals
from one synchronous bus to another. </FONT>
</P>
<P><FONT FACE="courier, monospace">For the ARC core, it is required
to add the local descriptor block to avoid the 580nsec requirement to
read the BDT's. In addition, there will be a bus translation unit
requied between BVCI and C/D bus. This bus translation unit will need
to operate as both a master and slave. It will probably be somewhat
more complicated than a slave-only interface. </FONT>
</P>
<P><FONT FACE="courier, monospace">Finally, there is the issue of
verification. For the ARC core, it is required to translate the DLL
code into our I/OSim environment to test basic usb transactions in
the &quot;system&quot; environment. This effort is required to do
standalone testing of the BVCI/ C-D interface block. For the
InSilicon case, the complete test environment is in Verilog with
possibly some PLI code for the transaction <BR>generators. It appears
that the translation of&nbsp; the BVCI host/slave tasks to the uP bus
should be straightforward. <BR>&nbsp;</FONT></P>
</BODY>
</HTML>