{"title": "Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 730, "page_last": 736, "abstract": null, "full_text": "Coding time-varying signals using sparse, \n\nshift-invariant representations \n\nMichael S. Lewicki* \nlewickiCsalk.edu \n\nTerrence J. Sejnowski \n\nterryCsalk.edu \n\nHoward Hughes Medical Institute \n\nComputational Neurobiology Laboratory \n\nThe Salk Institute \n\n10010 N. Torrey Pines Rd. \n\nLa Jolla, CA 92037 \n\nAbstract \n\nA common way to represent a time series is to divide it into short(cid:173)\nduration blocks, each of which is then represented by a set of basis \nfunctions. A limitation of this approach, however, is that the tem(cid:173)\nporal alignment of the basis functions with the underlying structure \nin the time series is arbitrary. We present an algorithm for encoding \na time series that does not require blocking the data. The algorithm \nfinds an efficient representation by inferring the best temporal po(cid:173)\nsitions for functions in a kernel basis. These can have arbitrary \ntemporal extent and are not constrained to be orthogonal. This \nallows the model to capture structure in the signal that may occur \nat arbitrary temporal positions and preserves the relative temporal \nstructure of underlying events. The model is shown to be equivalent \nto a very sparse and highly over complete basis. Under this model, \nthe mapping from the data to the representation is nonlinear, but \ncan be computed efficiently. This form also allows the use of ex(cid:173)\nisting methods for adapting the basis itself to data. This approach \nis applied to speech data and results in a shift invariant, spike-like \nrepresentation that resembles coding in the cochlear nerve. \n\n1 \n\nIntroduction \n\nTime series are often encoded by first dividing the signal into a sequence of blocks. \nThe data within each block is then fit with a standard basis such as a Fourier or \nwavelet. This has a limitation that the components of the bases are arbitrarily \naligned with respect to structure in the time series. Figure 1 shows a short segment \nof speech data and the boundaries of the blocks. Although the structure in the signal \nis largely periodic, each large oscillation appears in a different position within the \nblocks and is sometimes split across blocks. This problem is particularly present \nfor acoustic events with sharp onset, such as plosives in speech. It also presents \n\n\u00b7To whom correspondence should be addressed. \n\n\fCoding Time-Varying Signals Using Sparse, Shift-Invariant Representations \n\n731 \n\ndifficulties for encoding the signal efficiently, because any basis that is adapted to \nthe underlying structure must represent all possible phases. This can be somewhat \ncircumvented by techniques such as windowing or averaging sliding blocks, but it \nwould be more desirable if the representation were shift invariant. \n\ntime \n\nFigure 1: Blocking results in arbitrary phase alignment the underlying structure. \n\n2 The Model \n\nOur goal is to model a signal by using a small set of kernel functions that can be \nplaced at arbitrary time points. Ultimately, we want to find the minimal set of \nfunctions and time points that fit the signal within a given noise level. We expect \nthis type of model to work well for signals composed of events whose onset can occur \nat arbitrary temporal positions. Examples of these include, musical instruments \nsounds with sharp attack or plosive sounds in speech. \nWe assume time series x(t) is modeled by \n\n(1) \n\nwhere Ti indicates the temporal position of the ith kernel function,