Showing newest posts with label Architecture. Show older posts
Showing newest posts with label Architecture. Show older posts

Tuesday, April 7, 2009

SSIS Vs Talend Open Studio

From an initial evaluation of Talend Open Studio, open source ETL tools and comparison with Microsoft SQL Server Integration Services (SSIS) here are few quick points that were observed.

Talend Open Studio is an Open Source ETL Tool. The Tool is based off of Java and Perl. The tool presents us with wide range of ETL elements like

  1. Data connectivity (supports wide range of data connectivity from mySQL to Teradata supports all the widely used ODBC drivers) while SSIS has a generic ODBC connection manager, using which one can connect to data sources.
  2. Supports Java/Perl scripts while SSIS supports Microsoft VB .NET / C# scripts (with SQL 2008).
  3. Developer support from the Talend Forums and Tool Documentation while SSIS comes with MSDN Books Online, hands on labs, screen casts and webcasts, as well as TechNet forums from Microsoft.

Coming to the various components/elements of the tool here's a very high level overview,

Microsoft SSIS

Elements

Control Flow

Containers:- For Each Loop, For Loop, Sequence Containers, etc.

Tasks:- Execute SQL Task, Execute Package Task, Data Flow Task, Script Task, etc

Data Flow

Source:- ODBC, Excel, Flat Files, XML, etc

Transformations:- Aggregate, Sort, Lookup, Slowly Changing Dimension Fuzzy lookup fuzzy grouping, etc.

Destinations:- ODBC, Excel, Flat Files, XML, etc

Event Handlers

Error Handling and Logging Mechanisms:- Event Logging, Checkpoints, Error Handling

Variables

Global and local scoped variables

Configuration Files


Talend OpenStudio

Tasks/Elements

Business Intelligence components

Supports Slowly Changing Dimensions, Supports MDX queries

Business Components

Supports Microsoft AX

Custom Code

Allows Custom Java code (like Script Task/Script Components in SSIS)

Data Quality

Adding Surrogate Keys, Lookup (Fuzzy, Inteval)

Database Component

Similar to DataFlow Container, Inputs, Outputs

Database Utilities Components

Create Tables, ParseRecordset

ELT Component

Aggregates, Filter rows/columns

File Component

All File Operations

Internet Component

FTP, RSS, Mails, Mom, Web services, Sockets, XML RPC, Files

Logs & Errors Components

Error Handlers and loggers, Job Kill,

Misc Group Components

Miscellaneous Components

Orchestration Components

Job Sequencing

Processing Components

Aggregations, Mapping, Transformations, Filtering, Denormalizations

System Components

Operating System level tasks

XML Components

All XML operations

Variables

Supports creations and usage of variables

Conclusion

In overall perspective, we save on licensing cost associated with Microsoft SQL Server Integration Services over Talend Open Studio, but we lose out on several other aspects like,

  • Unified BI toolset offered from Microsoft in the form of MS BI toolset which includes Integration Services, Reporting Services, and Analysis Services when purchasing a single license of Microsoft SQL Server 2005/2008.
  • Developer support from online communities for MS BI.
  • Online Trainings in the form of Webcasts/ Virtual Labs/ Hands on labs etc.

Monday, February 2, 2009

MSDE to SQL Express 2008 Migration

One of our Consulting engagements demanded for an MSDE to SQL Express 2008 database upgrade to occur at the time of the client application installation. So we proposed a solution for using SQL Management objects (in short SMO) to accomplish this task. SMO is defined as a collection of objects that are designed for programming all aspects of managing SQL Server. SMO are industry standard practice to provide an abstraction to the end user with the Databases. Here’s our experience with SQL Management Objects on this engagement.


Objectives:-
Our customer happens to be a giant in the retail industry. The customer had a Purchase Order Management Client application which was several releases old built on the .NET Framework 2.0 and SQL Server 2000 Microsoft Data Engine (MSDE). The client application uses a SQL Server 2000 Desktop Database Engine (MSDE) database to store most data, consisting of lookup data synchronized with the central server and worksheet data that are created and modified by the end user. The current security architecture of the application uses a SQL defined account. In order to support the latest release for the client application our development team decided to upgrade from MSDE to SQL Express 2008, since SQL Express 2008 had better security features than MSDE.

Challenges:-
The major challenges that we faced during this engagement was to come up with a strategy so that the end user/Technical support Staff had very minimal amount of task while performing this upgrade. So our strategies included,


  1. A silent installation script for installing SQL Express 2008 with a named instance
  2. When the application is upgraded, move the old worksheet data from the old MSDE instance to new SQL Express 2008 instance.
Silent Installation of SQL Express 2008
Few pre requisites for SQL Express 2008 installation are,


  1. .NET Framework 2.0 SP2 or higher
  2. Windows Installer 4.5

Since our new application was being built on .NET 3.5, we had to upgrade all the retailers’ machines with .NET Framework 3.5 SP1, but this does not ensure that we have Windows Installer 4.5 installed on the machine.
So we had to write a custom MS DOS batch script which checks for the Windows Installer 4.5 and proceed further with SQL Express installation.


Database Migration from MSDE to SQL Express 2008
Now comes the interesting part of the whole migration where, we had to come up with a strategy to migrate the existing database from MSDE to SQL Express 2008. To this the SQL Management Objects (or SMO) came in very handy for us to do the development.
Here is flow chart of actions that needs to be taken in order to migrate the database from MSDE to SQL Express 2008,



Here is the code snippet where we do the database migration from MSDE to SQL Express 2008.
First of all we must add the following references for SMO,
The Class diagram for the same is as below,
Class Constructor:-
The class constructor is being used to initialize the Server objects and to obtain a handler to the MSDE and SQL Express Instances.

Getting physical file names of the database in MSDE:-
This function gets all the physical files names of the database, a database can include several Data files and log files collections, The FileGroupCollection objects is a collection of all the data files associated with the database, while the LogFileCollection object as the name suggests has a collection of all the log files of the database. It becomes essential for us to obtain all the physical files of the database before migrating it to SQL Express 2008.



Detaching the database from MSDE and attaching it to SQL Express 2008
The Server object gives us methods to detach and attach a database; these methods are used as below:


We hope the information was useful. We will come up with such information more in the future.

Tuesday, December 30, 2008

Purchase Order Management

One of our consulting engagements on Microsoft Business Intelligence Stack is with Purchase Order Management for a retail giant in apparels and home goods across the United States. When it comes to the Retail Industry, most people tend to think it as a Point of Sale (POS) solution. Apart from a POS application, the Retail industry also has to cater to the needs of the retail stores for their Order Management. In simple terms a Purchase Order Management System is a system which is used to manage the purchase orders. But what makes a purchase order (PO) a vital document is that, it is a document issued by a buyer to a seller, indicating the type, quantities and agreed prices for products or services the seller will provide to the buyer. Sending a PO to a supplier constitutes a legal offer to buy products or services. Acceptance of a PO by a seller usually forms a once-off contract between the buyer and seller so no contract exists until the PO is accepted. So this makes a PO, very critical document for any retailer.

The company has a Purchase Order Management application suite which is several releases old. This application suite has few client applications built over Microsoft .NET coupled with legacy systems like Mainframes and RS6000 for Retail shops to place orders from their retail outlets.

Our team proposed new application architecture to cater to the needs of the PO Systems, as there were several challenges with upgrading the application, i.e., any change one of the modules in the application involved end-to-end changes in all the layers of the application. The new architecture of the application is based on .NET 3.5, WCF, BizTalk, SharePoint, SQL Server 2008 BI Stack. Now let us discuss about the role of Microsoft BI stack in this engagement.

Scenario:-
As per the existing application architecture, the retail shops and merchandising agents have the flexibility to either fill out a worksheet using a thick client application and transmit them to the server, where the purchase orders are created and moved to the Mainframes, or, the retail shops could place their orders via the merchandising agents, in which case, there were two workflows, one being the Merchandising Agents using the application to fill out a form to generate a PO or, create the PO on the Mainframe. So these two types of the purchase orders are categorized as to Application generated and non application generated purchase orders.

Worksheets are created using the Client Application is stored in the form of an XML document in the SQL Server 2000 (Client Application) and DB2 (Mainframe and RS6000) databases.

Here’s the block diagram that shows the current workflow of Purchase Order (PO) Processing,


Objective:-
The scope of this engagement is widespread, and there are several teams working different modules of the engagement.
Here are few of the modules that we were involved in,

  1. Conversion from SQL Server 2000 MSDE to SQL Express 2008 for Client Applications.
  2. Implement Integrated Windows Authentication on the Client Application.
  3. Generate Statistics and Audit Reports for Application/Feature usage.
Conversion from SQL Server 2000 MSDE to SQL Express 2008 for client application
This task looks simple though but posted a numerous issues at the time of implementation, here are the few of the tasks that were carried out for the same,
  1. Silent installation script to install SQL Express 2008 pre requisites and resume installation on system restart.
  2. Silent installation batch script to install SQL Express 2008 with a named instance.

Implement Integrated Windows Authentication on the Client Application

Implementing integrated windows authentication was rather simple task. We had to create one LDAP group for the Retail shops/Merchandising Agents and one LDAP group for the administrative users and attach them to the corresponding roles to have data entry or administrative permissions. Thanks to Credential caching feature of Windows, this ensures integrated windows authentication on SQL Express 2008 even in a disconnected mode.


Generate Statistics and Audit Reports for Application/Feature usage

The Statistics and Audit Reports for Application usage is one of the major tasks that we were involved in. There are 3 different reports,

  1. A report to determine how many retail shops are using the client application. The selection criteria for this report is flexible to produce the result based on different combinations of divisions (for ex, apparels, home goods, etc), departments (ex, winter wear, men’s footwear, accessories, etc), retail shop and worksheet transmitter, this is also called the productivity report.
  2. A report that tells how many PO’s were created for individual departments also called as master report, with the division as selection criteria.
  3. An audit report that provides the audit information regarding use of the Lead Time Calculation (LTC) logic, that suggests optimal shipping dates, within the client application, in order to assess the level of use as well as the amount of manual adjustment that is performed after the logic has been applied in the client application. The weekly report illustrates by user the methods by which ship date and ladder plan information was supplied to the client application.

To generate these reports we have used SQL Server Integration Services (SSIS) as ETL tool and SQL Server Reporting Services (SSRS) for the reporting, coupled with ASP.NET 2.0 Website to host the reports and provide selection criteria.

The SSIS packages fetches PO’s from various data sources such as the SQL Server 2000 databases, Mainframe and RS6000 DB2 databases and as per the new architecture of the system, the master data will also be available in a SQL Server 2008 databases. But since there are divisions which are still running on the older version of the application suite, it is necessary to keep the data from the older system as is and provide a switch for the package to start fetching data from SQL Server 2008 databases when all of the divisions are migrated to new version.

The productivity report and the master report are straight forward. These reports count the number of PO’s generated using client application, Mainframes, and RS6000. After getting the count of PO’s these are then stored in the corresponding tables for the reports.

The audit report requires to read through the worksheet and PO XML documents stored in the database to check whether the LTC logic was used or not. SQL Server provides us with XQuery language, using which we can query xml data type. XQuery handles both typed and untyped data. After reading through the XML documents, the values of the required fields are stored in a table. This table contains all the line items of the worksheet, in which the LTC logic was switched on/off.

Apart from these reports there are other BI requirements in this application suite which are planned, for example, analyzing quarterly sales results and forecasting revenues, using SQL Server Analysis Service. Soon we will have more updates on this blog, with more data related to Microsoft BI technologies used in this engagement.

Monday, September 22, 2008

Large Enterprise SharePoint(MOSS) deployment

MOSS has highly configurable and flexible architecture that can easily accommodate some of most challenging collaboration requirements in large enterprises. One of the biggest benefits to leveraging MOSS as opposed to developing your own architecture for portal/collaboration solutions is the prepackaged saleable design of MOSS that you get out of the box. However as with many large enterprises one of the more complex design choices is on how to make the collaboration solution available to its geographically distributed workforce . Broadly the choices would be to either adopt one of the following options

  • Centralized solution
  • Decentralized solution

In most cases a centralized solution would be recommended so that the operations, governance, High Availability(HA) and Disaster Recovery(DR) can be managed from a single location and a tighter policy can be enforced, however in some cases the network may not be as reliable or scenarios where centralization may have challenges with high latency between the branches and the central/ low bandwidth etc may cause the enterprise to adopt a decentralized solution. But how do you know which is the optimal MOSS deployment strategy in your case?

Last week we were working with a large financial service provider in North America to demonstrate the feasibility of deploying MOSS as a central solution, the strategy to improve the end user experience for a centralized MOSS solution was to:

  1. Provide a highly scaled up + scaled out MOSS farm - Based on the role of the MOSS server i.e. Web Front End(WFE) or Index Server it can be either scaled up or scaled out only. Providing the right amount of scale ensures that the centralized solution can cater to the requests from users across the enterprise.
  2. Optimal logical architecture and information architecture - the right design ensures that a single ContentDB is not the bottle neck for all requests.
  3. Compression - compression improves performance over the wire
  4. Caching - Caching can drastically improve the end user experience, a smart caching solution will ensure that only delta changes from previous requests are being sent over the wire.

Both 3 and 4(above) can be achieved by leveraging a SharePoint aware network accelerator such as Certeon, other general purpose network accelerators may be used as well, however leveraging a SharePoint aware one is always better.

As with any other large project undertaking its important to benchmark the system performance before rolling it out into production, metrics captured during the benchmark would help validate if a centralized solution would be the right choice, additionally it could provide metrics that can be used to decide if additional bandwidth needs to be arranged etc. The way we benchmarked end user experience before rolling the entire solution in production was by leveraging Shunra. Shunra can be used to capture the network conditions (latency, packet loss) between the various branches and central office, once the data is captured it can be used by the Shunra device to simulate WAN conditions in a lab. Additionally Shunra can be used to simulate what-if scenarios that may occur in your WAN. Running test scripts simulating end users and having Shunra simulate various WAN conditions can provide valuable metrics that can be used to make a more informed decision on MOSS deployment.

Performance tuning and benchmarking can be a complex undertaking, and many a times the interdependence of various network parameters during simulation and actual load on the MOSS box(es) can be confusing and perplexing, if not holistically viewed and planned it can skew the final metrics - the entire endeavor would be a waste of time if not panned and executed correctly.

Benchmarking the MOSS deployment and other design choices and rollout strategies can significantly contribute towards a successful enterprise SharePoint(MOSS) deployment. However at the end of the day its still the onus of the enterprises IT group to understand and execute the required tests prior to rollout to ensure the success of SharePoint solution. Its unfortunate that in many enterprises limited interpersonal dynamics within the IT group and lack of communication to the portal stakeholders leads to a less than desirable MOSS solution, IT group seldom looks into what the end users needs. No wonder IT is still seen as a "cost center" rather than a value center. " Technology produces its best results when an organization has the doctrine, structure, and incentives to exploit it" - a famous quote on a different context (the 9/11 commission report) however its applies to how every enterprise can strategically leverage IT.

Friday, August 8, 2008

How do you qualify something as "Cloud Computing"?

Cloud computing must be definitely be the next big thing, practically every one with an online business model is now referring to their service as cloud computing - starting from your average Joe hosting firm all the way to the SaaS/S+S vendors, every wants to ride the next buzzword wave, and thus distorts the cloud computing term all together. Part of the reason being that historically the term "cloud" loosely referred to anything that’s available online/on the internet. Ask a bunch of geeks and you would get a different explanation of cloud computing from each person (if you ever asked a bunch of people what Web 2.0 is you know what I mean). So how can we qualify if a service is really leveraging the cloud computing model?

That’s a tough question. An easier way to answer this is by first examining the behavior of services provided by some of the well known cloud computing vendors. Lets take Amazon and Google as two examples, both have different business models but under the hood when we use their cloud computing service within our application/service the 3 common behavior that we see from it are:
a. Scalability
b. Availability
c. Economical/cost effective

Lets discuss scalability first. Imagine your tasked to design an online application/service that should be "Internet scalable". Imagine (if you will) that your designing the next big social networking/ the next big Youtube etc. How do you go about designing it to support millions of users? For that matter how does anyone do it? The short answer - you do that iteratively. Iteratively is a good euphemism, the reality is that you do that after several design blunders and limitations :). In the iterative approach you would first design a cost effective solution to scale for a smaller audience and then when you start seeing more traffic you add more hardware till the point it doesn’t improve anything then you start to redesign for better scalability. This is how Amazon and Google have grown as well and have designed their overall system to support such high scalability - but interesting they have abstracted their design to a degree that it could be repackaged into a subscription based service - a cloud computing service.

But first how do you build a massively scalable solution. How would you build your architecture to accommodate linear scalability?

Any architecture can be described as being composed of two types of components
a. Stateless components
b. State-full components

Stateless components are those that only do some processing on data and don’t persist state - hence are easily to scale via a scaled out design which as a byproduct also gives you higher availability, also using scaled out architecture you can keep adding inexpensive boxes to the system thereby reducing your cost while the system continue to keep humming.

Statefull components on the other hand are those that persist state of resources that it need to work with - for e.g., File system, databases, BLOBs etc. traditionally the only option to scale these would be to via a scaled-up approach - i.e. beef up the hardware on the server. This is more expensive than a scaled out architecture and introduces single point of failure. Traditionally we have been using approaches such as data replication etc to scale out these components but it introduces several complications. Fundamentally statefull components are the ones that impact overall performance and scalability of a system.

Statefull components may be notoriously difficult to scale, however even stateless components can present scalability challenges when your planning for massively scalable/internet scalability scenarios. Cloud computing vendors have made significant investments in technology to ensure that compute intensive processing can been parallelized and distributed to the max. Googles MapReduce is a elegant approach to solving these challenges.

We need a new breed of products to handle how stateless and statefull components can be scaled and distributed, the traditional approach of persisting state in the database etc just doesn’t make sense, increasingly we see that the trend is not to store atomic level transaction information in a database as we used to but store it in the form of blobs, but at the same time ensure that we do so while managing resources economically. Large part of Amazons and Googles innovation (their magic sauce if you will) in cloud computing involves developing these proprietary components.

Another way of looking at the scalability that cloud computing gives you is the ability to scale your computing resources as an when you see demand - after all it’s a pay as you go model. Traditionally online companies have been provisioning for hardware to meet spikes in traffic (like holiday seasons etc). That would also imply that they are unnecessarily paying for the extra scale which they don’t leverage all the time. By hosting it via a cloud vendor they can dramatically reduce their operational cost.

Well its not just Amazon and Google that are thinking in this direction, several platform vendors like Microsoft and other open source groups are releasing products are that will address these challenges. You can expect to see some radically different products being released from these vendors that address the distributed massively-scalable challenges. You can also expect (hosting)companies to leverage these prepackaged cloud computing capabilities and provide it as a subscription service.

Cloud computing vendors are able to provide Internet scalability at an affordable cost and can potentially give you a better SLA that if you were to manage your own infrastructure - that’s the overall package that makes cloud computing so compelling, probably best described by Jef Bezos - "You don't generate your own electricity, why generate your own computing?". Arguably there are other factors that influence your vendor decision but we hope that the next time your evaluating a cloud computing vendor/solution or building your own, you know what to look for.

Monday, August 4, 2008

ISV Wish List

In the past couple of months we have been working with several ISVs to help them modernize their existing solutions/re-platform the existing application, and after some time a common patter start to emerge across all the ISV requirements - a ISV wish list if you will, which would like to summarize as:

a. Rich metadata driven extensible interface
b. Flexible and highly configurable business processes
c. Support for multiple back ends

In case of a and b (above): ISVs usually have their own professional service group that does the deployment for a new client, traditionally for each customization required by the new client the professional service group would custom develop it, however these ISVs going forward want to leverage a metadata driven approach whereby the UI and validation is dynamically created.

Up until now we have had to custom build many of these capabilities, however now with .NET 3.x several of these challenges can be met using the out of the box capabilities:

a. Rich metadata driven interface : WPF with XAML based declarative approach combined with loose XAML, the new data binding and data templatization model works extremely well for metadata driven solutions, additionally Prism (now WPF composite framework) can be used as the composite application framework container for building a thick client WPF application.

b. Flexible and highly configurable business processes: One stop solution, Windows Workflow Foundation (WF) in .NET 3.x. WF can be used to easily model complex business processes instead of writing large amount of code. Additionally we can leverage XAML activation - whereby the WF can be dynamically build and executed (i.e. dynamically compiled), this provides professional service teams with incredible flexibility to tailor/configure the business process for a given client.

c. Support for multiple back ends: Microsoft Entity Framework currently in CTP provides an entity modeling capability (ORM tool) that abstracts away the underlying DB product. All CRUD operations are performed on the entity data model and the entity framework can translate the CRUD operations to the specifics of the respective DB vendors. Additionally Entity Frameworks works nicely with WCF, all the classes that are generated behind the scene when you create the entity models are created with WCF serialization in mind.

Microsoft also informs us that they too see many ISVs having the same "wish list" and its expected that .NET 4.0 will support more of these capabilities natively.

Friday, August 1, 2008

Why Architecture is strategic

Every once in a while when we engage our clients/prospective clients on a consulting engagement they question the involvement of an technical architect in the engagement, cause they find that its an unnecessary expense and that the architecture can be developed/recommended by one of the existing development resources. Well that brings an interesting question. What is software architecture? Ask a bunch of people and a common response you get is that a software architecture defines the high level design and components to be used in the proposed system. Its correct but unfortunately that’s a very narrow perspective into what goes into defining an architecture, architecture is not just a technical decision of various software components - its strategic!. Its strategic because the architecture defines how you can monetize the product/service, the software architecture you define is what can help your product compete with other vendors in the same space and give you that edge. Not thinking of architecture as something that’s strategic is being shortsighted. To illustrate this let me give you an example.


Early this week we were on a call with Microsoft and a large enterprise in financial services to identify the high availability and disaster recovery (HA/DR) solution for their MOSS environment. The discussion finally lead down to identifying a suitable product for DR and we were discussing the many DR product options available in the market that work with MOSS. Microsoft with System Center Data Protection Manager 2007 (DPM 2007) presented a strong candidate for the client to consider from the sheer integration/understanding of the product and MOSS, however there were other vendors in DR space for MOSS as well and the financial service client was in favor of picking one of the other vendor, Microsoft sales knows how to play their cards well and dealt the trump card - Data Protection Manager (DPM) had architected their solution to leverage only the exposed MOSS APIs (which if your familiar with MOSS tells you that there are several limitations to this approach) and hence DPM is supported under Microsofts premium support where as several of the other DR vendors did not leverage the (limited)APIs but instead did direct DB access, this "hack" was easier to do and gave the vendors products more features to sell - however their product would not be supported by Microsofts premium support, i.e. if the client has any problems with their MOSS solution and calls Microsoft for support, Microsoft can deny support because they were using a DR product which is using an approach that is not supported by Microsoft. Bummer. The various DR vendors didn’t think about this in the architecture design, the "hack" they decided on which seemed technically appealing did not help them in the sale.

To the financial service client this was like a show stopper and immediately decided against its previous DR vendor selection. In the past several consulting engagements we have helped ISVs architecture their solution that avoid these kinds of pitfalls.

The next time your thinking of building an architecture are you thinking of how the architecture can empower your sales team with the strategic "battle card" to help win the deal? Is your architecture strategic?