Stochastic mixed-signal VLSI architecture for highdimensional kernel machines

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

RomanGenovandGertCauwenberghs

DepartmentofElectricalandComputerEngineering

JohnsHopkinsUniversity,Baltimore,MD21218

roman,gert@jhu.edu

Abstract

Amixed-signalparadigmispresentedforhigh-resolutionparallelinner-

productcomputationinveryhighdimensions,suitableforefﬁcientim-

plementationofkernelsinimageprocessing.Atthecoreoftheexternally

digitalarchitectureisahigh-density,low-poweranalogarrayperforming

binary-binarypartialmatrix-vectormultiplication.Fulldigitalresolution

ismaintainedevenwithlow-resolutionanalog-to-digitalconversion,ow-

ingtorandomstatisticsintheanalogsummationofbinaryproducts.A

randommodulationschemeproducesnear-Bernouillistatisticsevenfor

highlycorrelatedinputs.Theapproachisvalidatedwithrealimagedata,

andwithexperimentalresultsfromaCID/DRAManalogarrayprototype

in0.5mCMOS.

1Introduction

Analogcomputationalarrays[1,2,3,4]forneuralinformationprocessingofferverylarge

integrationdensityandthroughputasneededforreal-timetasksincomputervisionand

patternrecognition[5].Despitethesuccessofadaptivealgorithmsandarchitecturesinre-

ducingtheeffectofanalogcomponentmismatchandnoiseonsystemperformance[6,7],

theprecisionandrepeatabilityofanalogVLSIcomputationunderprocessandenviron-

mentalvariationsisinadequateforsomeapplications.Digitalimplementation[10]offers

absoluteprecisionlimitedonlybywordlength,butatthecostofsigniﬁcantlylargersilicon

areaandpowerdissipationcomparedwithdedicated,ﬁne-grainparallelanalogimplemen-

tation,e.g.,[2,4].

Thepurposeofthispaperistwofold:topresentaninternallyanalog,externallydigitalar-

chitecturefordedicatedVLSIkernel-basedarrayprocessingthatoutperformspurelydigital

approacheswithafactor100-10,000inthroughput,densityandenergyefﬁciency;andto

provideaschemefordigitalresolutionenhancementthatexploitsBernouillirandomstatis-

ticsofbinaryvectors.Largestgainsinsystemprecisionareobtainedforhighinputdimen-

sions.Theframeworkallowstooperateatfulldigitalresolutionwithrelativelyimprecise

analoghardware,andwithminimalcostinimplementationcomplexitytorandomizethe

inputdata.

Thecomputationalcoreofinner-productbasedkerneloperationsinimageprocessingandpatternrecognitionisthatofvector-matrixmultiplication(VMM)inhighdimensions:

(1)

with-dimensionalinputvector,-dimensionaloutputvector,andmatrix

elements.Inartiﬁcialneuralnetworks,thematrixelementscorrespondto

weights,orsynapses,betweenneurons.Theelementsalsorepresenttemplates

inavectorquantizer[8],orsupportvectorsinasupportvectormachine[9].In

whatfollowsweconcentrateonVMMcomputationwhichdominatesinner-productbased1

kernelcomputationsforhighvectordimensions.

2TheKerneltron:AMassivelyParallelVLSIComputationalArray

2.1InternallyAnalog,ExternallyDigitalComputation

Theapproachcombinesthecomputationalefﬁciencyofanalogarrayprocessingwiththe

precisionofdigitalprocessingandtheconvenienceofaprogrammableandreconﬁgurable

digitalinterface.

Thedigitalrepresentationisembeddedintheanalogarrayarchitecture,withinputspre-

sentedinbit-serialfashion,andmatrixelementsstoredlocallyinbit-parallelform:

(2)

(3)

decomposing(1)into:

(4)

withbinary-binaryVMMpartials:

(5)

Thekeyistocomputeandaccumulatethebinary-binarypartialproducts(5)usinganana-

logVMMarray,andtocombinethequantizedresultsinthedigitaldomainaccordingto(4).

Digital-to-analogconversionattheinputinterfaceisinherentinthebit-serialimplementa-

tion,androw-parallelanalog-to-digitalconverters(ADCs)areusedattheoutputinterface

toquantize.A512128arrayprototypeusingCID/DRAMcellsisshownin

Figure1(a).

2.2CID/DRAMCellandArray

TheunitcellintheanalogarraycombinesaCIDcomputationalelement[12,13]witha

DRAMstorageelement.Thecellstoresonebitofamatrixelement,performs

aone-quadrantbinary-binarymultiplicationofandin(5),andaccumulates

(a)RS(i)

Vout(i)RSM1M2M3

0Vdd/2VddDRAMCID

Write

Compute0Vdd/2Vdd

0Vdd/2Vddx(j(j

(b)

Figure1:(a)MicrographoftheKerneltronprototype,containinganarrayof

CID/DRAMcells,andarow-parallelbankofﬂashADCs.Diesizeis

in0.5mCMOStechnology.(b)CIDcomputationalcellwithintegratedDRAMstorage.

Circuitdiagram,andchargetransferdiagramforactivewriteandcomputeoperations.

theresultacrosscellswithcommonandindices.Thecircuitdiagramandoperation

ofthecellaregiveninFigure1(b).Anarrayofcellsthusperforms(unsigned)binary

multiplication(5)ofmatrixandvectoryielding,forvaluesofin

parallelacrossthearray,andvaluesofinsequenceovertime.

ThecellcontainsthreeMOStransistorsconnectedinseriesasdepictedinFigure1(b).

TransistorsM1andM2compriseadynamicrandom-accessmemory(DRAM)cell,with

switchM1controlledbyRowSelectsignal.Whenactivated,thebinaryquantity

iswrittenintheformofcharge(eitheror0)storedunderthegateofM2.

TransistorsM2andM3inturncompriseachargeinjectiondevice(CID),whichbyvirtueof

chargeconservationmoveselectricchargebetweentwopotentialwellsinanon-destructive

manner[12,13,14].

ThechargeleftunderthegateofM2canonlyberedistributedbetweenthetwoCIDtran-

sistors,M2andM3.AnactivechargetransferfromM2toM3canonlyoccurifthereis

non-zerochargestored,andifthepotentialonthegateofM2dropsbelowthatofM3[12].

ThisconditionimpliesalogicalAND,i.e.,,unsignedbinarymultiplication,ofand

.Themultiply-and-accumulateoperationisthencompletedbycapacitivelysensing

theamountofchargetransferredontotheelectrodeofM3,theoutputsummingnode.To

thisend,thevoltageontheoutputline,leftﬂoatingafterbeingpre-chargedto,

isobserved.Whenthechargetransferisactive,thecellcontributesachangeinvoltagewhereisthetotalcapacitanceontheoutputlineacrosscells.

Thetotalresponseisthusproportionaltothenumberofactivelytransferringcells.After

deactivatingtheinput,thetransferredchargereturnstothestoragenodeM2.The

CIDcomputationisnon-destructiveandintrinsicallyreversible[12],andDRAMrefreshis

onlyrequiredtocounteractjunctionandsubthresholdleakage.

ThebottomdiagraminFigure1(b)depictsthechargetransfertimingdiagramforwrite