正则表达式 Regular Expression 例子 sample VB版

正则表达式 Regular Expression 例子 sample VB版
正则表达式 Regular Expression 例子 sample VB版

VS SDK Regular Expression Language Service Example Deep Dive (VB)

István Novák (DiveDeeper), Grepton Ltd.

May, 2008

Introduction

This example implements a small language service for demonstration purposes. This is called Regular Expression Language Service since it can tokenize text by RegEx patterns (lower case letters, capital letters, digits) and can use its own syntax coloring scheme for each token. However, the functionality of this sample is quite far away from a full language service it illustrates the basics. The source files belonging to this code have only about three hundred lines of essential code. When reading through this deep dive you are going to get familiar with the following concepts: How language services should be registered with Visual Studio?

What kind of lifecycle management tasks a simple language service has?

How to create a very simple language service?

How to implement a scanner supporting syntax coloring?

To understand concepts treated here it is assumed that you are familiar with the idea of VSPackages and you know how to build and register very simple (even non-functional) packages. To get more information about packages, please have a look at the Package Reference Sample (VisualBasic Reference.Package sample). Very basic knowledge about regular expressions is also expected.

Regular Expression Language Service

Open the Microsoft Visual Studio 2008 SDK Browser and select the Samples tab. In the top middle list you can search for the “VisualBasic Example.RegExLangServ” sample. Please, use the “Open this sample in Visual Studio” link at the top right panel of the browser app to prepare the sample. The application opens in Visual Studio 2008.

Running the sample

Rebuild the package and start it with the Experimental Hive! Without creating a new solution, add a new text file with the File|New|File... menu function. Use the

File|Save As menu function to store the text file with the RegexFile.rgx name. To avoid attaching the .txt extension to the end of the file name, set the “Save as type” to “All files (*.*)” as illustrated in Figure 1:

Figure 1: Save the file with .rgx extension

Type a few words, number and punctuation characters into the editor and see how they are colored! You can see an illustration in Figure 2:

Figure 2: Our language service has effect on syntax coloring

Now, try to save the file again with the File|Save As menu function. This time the Sav e As dialog contains the “RegEx File (*.rgx)” in its “Save as type” field indicating that it recognizes this file type with .rgx extension.

The structure of the sample

The solution contains a VSPackage project named RegExLangServ that uses a few reference assemblies for VS interop starting with name “Microsoft.VisualStudio”. The project’s source files are the following:

The essential code of this sample is in the RegExLangServ.vb, RegExScanner.vb and VsPkg.vb files; in the next scenarios I focus on them. In code extracts used in this deep dive I will omit or change comments to support better readability and remove using clauses, namespace declarations or other non-relevant elements. Scenario: Registering the Language Service with an associated file extension

The language service this sample implements is intended to be used by Visual Studio Shell and by any other third party packages that want to use the functionality of the service. For example, the code window of Visual Studio uses this service for syntax coloring. Just as for any other services a language service also has to be registered with Visual Studio. The registration information is provided by attributes decorating the package class (VsPkg.vb):

_ _

' --- Other attributes omitted

Public NotInheritable Class RegularExpressionLanguageServicePackage

Inherits Shell.Package

Implements IDisposable

' ...

End Class

Please note, there are a few attributes not indicated in the code extract above. If you are not familiar with them, take a look at the Package Reference Sample Deep Dive. Language service registration uses the following two attributes:

With these attributes we registered the regular expression language service. However to use the service we have to take care of service instantiation. Scenario: Lifecycle management of a language service

Just as in case of other local or proffered services, our package must manage the lifecycle of the regular expression language service. For most services created with the Managed Package Framework lifecycle management is about creating the service instance. For language services we must take care of the cleanup process, since at the back language services use unmanaged code and unmanaged resources. Our package class uses the standard pattern for managing the lifecycle of the language service instance:

Public NotInheritable Class RegularExpressionLanguageServicePackage

Inherits Shell.Package

Implements IDisposable

Private langService As RegularExpressionLanguageService

Protected Overrides Sub Initialize()

MyBase.Initialize()

langService = New RegularExpressionLanguageService()

langService.SetSite(Me)

Dim sc As IServiceContainer = CType(Me, IServiceContainer)

sc.AddService(GetType(RegularExpressionLanguageService), langService, True)

End Sub

Protected Overrides Overloads Sub Dispose(ByVal disposing As Boolean) Try

If disposing Then

If langService IsNot Nothing Then

langService.Dispose()

End If

End If

Finally

MyBase.Dispose(disposing)

End Try

End Sub

Public Sub Dispos() Implements IDisposable.Dispose

Dispose(True)

GC.SuppressFinalize(Me)

End Sub

End Class

Since our package’s goal is to provide the regular expression language service, if our package gets loaded into the memory and sited (this is the time when the overridden Initialize method is called), we instantly create the service instance. The language service gets sited in our package and then added to the package’s service container and also promoted to the parent container.

In the overridden Dispose method we release the resources held by the language service then clean up the other resources held by the package. The overridden Dispose is called from public Dispose that is implicit implementation of the IDisposable interface. Since our package is cleaned up here, we must use the GC.SuppressFinalize method call to avoid double cleanup of the package instance. The lifecycle management pattern used here should be applied for your own language services.

Scenario: Implementing a small language service

The code editor built in Visual Studio can be customized by language services. This customization features include brace matching, syntax coloring, IntelliSence and many others. In order the code editor can leverage on a language service, it must access a few functions of them.

Such kind of function is the access to the so-called scanner and the parser of the language service. The scanner is responsible for retrieving tokens like keywords, identifiers, double precision numbers, strings, comments and many others from the source text. The parser is responsible to understand what the sequence of tokens means, whether it matches with the expected language syntax, and so on.

Syntax coloring basically uses only the scanner, but can use the parser, for example to use different colors for value and reference types. Brace matching generally uses the parser to find the matching pairs of opening and closing braces.

In this example we use a small language service based on regular expressions that use only the scanner for syntax coloring and no parser for more complex tasks.

To be a language service, we must create a COM object implementing a few interfaces with the IVsLanguage prefix in their names. The Managed Package Framework provides the LanguageService class that is the best type to start with instead of implementing the interfaces from scratch. To create a language service of our own, we must create a LanguageService derived class and override a few methods as the following code extract shows:

_

_

Friend Class RegularExpressionLanguageService

Inherits LanguageService

Private scanner As RegularExpressionScanner

Private preference As LanguagePreferences

Public Overrides Function ParseSource(ByVal req As ParseRequest) As AuthoringScope

Throw New NotImplementedException()

End Function

Public Overrides ReadOnly Property Name() As String

Get

Return "Regular Expression Language Service"

End Get

End Property

Public Overrides Function GetFormatFilterList() As String

Return VSPackage.RegExFormatFilter

End Function

Public Overrides Function GetScanner(ByVal buffer As _

Microsoft.VisualStudio.TextManager.Interop.IVsTextLines) As IScanner

If scanner Is Nothing Then

scanner = New RegularExpressionScanner()

End If

Return scanner

End Function

Public Overrides Function GetLanguagePreferences() As LanguagePreferences

If preference Is Nothing Then

preference = New LanguagePreferences(Me.Site,

GetType(RegularExpressionLanguageService).GUID, _

"Regular Expression Language Service")

End If

Return preference

End Function

End Class

(This is the full code of the class; I have only changed indenting and omitted comments.)

Our RegularExpressionLanguageService must be visible by COM and so must have an explicit GUID. The overridden Name property is used to obtain the name of

our language service. The GetFormatFilterList method retrieves the file filter expression u sed by the Save As dialog (“RegEx File (*.rgx)”).

The overridden ParseSource method is to parse the specified source code according to a ParseRequest instance. Since our language service does not implement a parser, we throw a NotImplementedException here.

Visual Studio supports language preference settings. Such kind of preference is for example IntelliSense support (supported or not), line numbers (should be displayed or not), the tab size used by the language and so on. By overriding the GetLanguagePreferences method we can tell what preferences are used by our service. In this implementation we use the default settings.

The GetScanner method is the most important one in our language service. This method retrieves an object implementing the IScanner interface. As its name suggests, the returned object represents the scanner used to tokenize the source code text. The responsibility of a scanner object is delegated to a RegularExpressionScanner instance I treat in the next scenario.

Scenario: Creating the scanner to support syntax coloring The scanner object is crucial for our regular expression language service. It implements the IScanner interface that has only two methods:

Public Interface IScanner

Sub SetSource (source As String, offset As Integer)

Function ScanTokenAndProvideInfoAboutIt (tokenInfo As TokenInfo, _

ByRef state As Integer) As Boolean

End Interface

The SetSource method is used to set a line to be parsed and also an offset is provided to start the parsing from. The ScanTokenAndProvideInfoAboutIt method is to obtain the next token from the currently parsed line. The TokenInfo parameter passed in is a structure to be filled up by the method, this represents the token scanned. The state parameter is an integer value representing the scanner state (it is used for so-called context-dependent scanning).

The RegularExpressionScanner class implements this interface:

Friend Class RegularExpressionScanner

Implements IScanner

Private sourceString As String

Private currentPos As Integer

Private Shared patternTable As RegularExpressionTableEntry() = _

New RegularExpressionTableEntry(3) _

{ _

New RegularExpressionTableEntry("[A-Z]?", https://www.360docs.net/doc/0612877917.html,ment), _ New RegularExpressionTableEntry("[a-z]?", TokenColor.Keyword), _ New RegularExpressionTableEntry("[0-9]?", TokenColor.Number), _

New RegularExpressionTableEntry(".", TokenColor.Text) _

}

Private Shared Sub MatchRegEx(ByVal source As String, ByRef charsMatched As Integer, _

ByRef color As TokenColor)

' --- Implementation omitted from this code extract

End Sub

Public Function ScanTokenAndProvideInfoAboutIt(ByVal tokenInfo As TokenInfo, _

ByRef state As Integer) As Boolean _

Implements IScanner.ScanTokenAndProvideInfoAboutIt

If sourceString.Length = 0 Then

Return False

End If

Dim color As TokenColor = TokenColor.Text

Dim charsMatched As Integer = 0

MatchRegEx(sourceString, charsMatched, color)

If tokenInfo IsNot Nothing Then

tokenInfo.Color = color

tokenInfo.Type = TokenType.Text

tokenInfo.StartIndex = currentPos

tokenInfo.EndIndex = Math.Max(currentPos, currentPos + charsMatched - 1)

End If

currentPos += charsMatched

sourceString = sourceString.Substring(charsMatched)

Return True

End Function

Public Sub SetSource(ByVal source As String, ByVal offset As Integer) _

Implements IScanner.SetSource

sourceString = source

currentPos = offset

End Sub

End Class

The implementation of the SetSource method is trivial. The ScanTokenAndProvideInfoAboutIt method uses MatchRegEx to obtain the next token. According to the token it retrieves the TokenInfo structure is filled up and the position where the next token starts is set.

The scanner defines a nested class called RegularExpressionTableEntry that describes a token represented by a RegEx and also assigns a token color to it. The static patternTable array demonstrates how this structure is set up. The MatchRegEx method uses this array to obtain the next token.

Summary

The Regular Expression Language Service sample demonstrates how easy is to create a very simple language service. In this case, the service created here implements a scanner that is able to tokenize the source text according to regular expression patterns. The language service also supports syntax coloring: each token accepted has a distinguishing color.

Language services must be registered in order to be accessible by the VS Shell and third party packages. With a simple decorating attribute on the package owning a language service it can be associated with a file extension. When the file with the specified extension is opened in the code editor the corresponding language service is used to edit it.

相关主题
相关文档
最新文档