Sunday, January 27, 2008

MOSS Requirement gathering/ user survey templates

Microsoft has already provided several MOSS planning templates http://technet2.microsoft.com/Office/en-us/library/0ed0b44c-d60d-4b85-87de-19065d9688351033.mspx?mfr=true , each planning section has details of the planning along with some sample templates, most of these template are for gathering information from end users.

Based on the MOSS capabilities your building, you can use the corresponding templates.

For instance planning template for
· Document usage http://go.microsoft.com/fwlink/?LinkId=73284&clcid=0x409
· Stakeholders http://go.microsoft.com/fwlink/?LinkId=73285&clcid=0x409
· Document libraries http://go.microsoft.com/fwlink/?LinkId=77248&clcid=0x409
· Document Content types and workflow http://go.microsoft.com/fwlink/?LinkId=73292&clcid=0x409
· Search http://go.microsoft.com/fwlink/?LinkId=81039&clcid=0x409

Thursday, January 24, 2008

SQL Server - Optimize as an in memory database

Currently working on a migration project for Java+Oracle to Java+SQL Server as you may expect there are several documented approaches for doing this migration however the challenge in this case is that the client is using Berkeley DB (in memory DB licensed by Oracle) in their distributed architecture and Microsoft currently doesn’t have a corresponding in memory DB product. So how can we build a database centric migration strategy (i.e. no change to client code)?
Initial brainstorming helped us identify 2 possible approaches to replacing the Berkeley DB: a. Use SQLCE for data caching b. Use SQL Express for data caching (also applies to SQL Standard and Enterprise skew) In both the above choices the key to arriving at the target solution is buffer pool management, SQL Server internally tries to reduce the disk I/O by storing data pages into the buffer pool.
SQL CE seemed a good choice initially because its an inproc DB an this inherent nature could improve query performance however we concluded that it was not the right fit since:•
  • SQL CE doesn’t have a native JDBC compliant driver, worst case we will have to use some .NET interop like JNBridge/ Web Services etc.
  • SQL CE doesn’t expose performance counters it will be very difficult to evaluate performance and SQL CE behavior under load, we can only measure response time and throughput but would not be able to observe internal resource utilization.
  • SQL CE doesn’t support very high level of concurrency (ideally meant for desktop and device usage)
  • SQL CE does not offers good buffer pool management

So with SQL CE out of the way we could just focus on leveraging SQL Express and optimizing the buffer pool management. So how does the buffer pool work?
When SQL Server starts it calculates the virtual address space for the buffer based on several parameters the most important being the physical memory on the system. SQL Server only identifies the max amount it should reserve for the buffer pool it doesn’t reserve the memory on startup it only commits memory on a need basis. The DMV sys.dm_os_sys_info can be used to view the amount of buffer memory, bpool_commit_target show "target" memory limit for the buffer pool and bpool_committed displays the amount of memory currently committed to the buffer pool( both number represent the number of 8 KB pages). This is also impacted by the configuration in the min server memory and max server memory. Note that when SQL Server starts it tries to commit large number of pages to buffer pool memory this is called ramp up time, SQL 2005 Enterprise edition tries to improve the performance by allocating 8 pages at a time (i.e. 64KB). Buffer manager performs disk I/O asynchronously. Optimizing the table data types and length to fit optimally into pages, identifying the indexing strategy + index usage and optimizing the I/O subsystem would all be important in optimizing the buffer pool management.
Scenarios that are not discussed in this blog which impact the buffer pool include
  • AWE (for 32 bit environment) - most organizations are going for 64bit servers so AWE itself would fade away
  • NUMA

There is news in the air that Microsoft is planning to release an in-memory DB however we are under NDA not to disclose further information, will blog about it separately when the product plans are more concrete and when the NDA is lifted.

Friday, January 4, 2008

Using MSBI for Claim Reconciliation

Using MSBI for Claims Reconciliation

Many insurance companies have recognized the need to improve the efficiency of their claims management process. They have streamlined processes, eliminated paper-based forms and redistributed work to match the demands to skills. The objective of their efforts is to lower costs, while also increasing overall throughput.

Scenario:
The customer is a huge multi-national insurance company and has a claims group to manage the claims process. When we talk about thousands of insurances being provided to various customers (general public and companies), we are also looking at thousands of claims being tracked and processed every day. And as we explained above, there needs to be a stringent process of claims analysis to do Claims Reconciliation.

The Objective:
The customer was looking for a solution that could solve his problem (explained below). He found a lot of tools that could solve only his individual problems but proved tough when it came to integrating with other tools. He wanted to look into the Microsoft BI Suite of products if that was a solution as he had seen one of the demos in tech-ed where the speaker repeatedly mentioned Microsoft Integrated BI Platform. Luckily I got a chance to work with this client on a proof of concept and in a 2 weeks effort we were able to solve his problems using the MS BI suite.
I wanted to share my experience working on this specific problem for the reconciliation process.

The Process:
Claims data was collected from 4 different sources. Details collected included low level details from Checks issued for a Claim to higher level data like the Account and the Program the claim belongs to.
Once data was received from these sources, the Analyst did the following activities:
· Used excel to extract data dumps from sources using queries.
· Find out the Programs that showed inconsistencies across the different source data.
· Then manually drill down from Program to Account to Claim and then to the Check level detail to find out the transaction to be observed.
· It was first assessed by the backend data analysts to see if the inconsistency was not due to any bad data load or extract.
· If not, it was reported to the concerned manager who passed it through the reporting hierarchy to the Program manager who then had to take some action if it was of concern.

The Problem:
As you can see, the process involved
· A lot of manual effort from the user to aggregate the data and to find out the inconsistent transactions from the thousands of transaction coming in everyday.
· The process was too cumbersome and time consuming as most of the things were done manually.
· By the time the results reached to the relevant people, it was too late further delaying the decision making process.

The Solution:
An integrated data warehouse was built collecting data from the 4 different sources. Confirmed dimensions were built for Claim, Account, Program so that it could be analyzed across the data. We used Integration Services to do the ETL.

Now came the challenging part of building the Analysis services Cube for this.
There was a lot of data to be reconciled meaning loaded, analyzed, corrected and re-loaded again before being viewed by peer groups. During the correction process, the analysts wanted to make changes on the fly and and wanted to see this changes reflected in the reports. If we had built just a MOLAP storage structure, it would have taken a lot of time due to frequent corrections and reloading leading to frequent processing, so we followed the below process:
·Created 4 measure groups for data retrived from each of the 4 sources.
·Created Month wise partitions across all measure groups with MOLAP storage and the latest month partition with ROLAP storage. This reduced great amount of time and effort in re-processing the partition whenever changes were made.
·Standard Summary reporting services reports were created on top of the integrated cube showing the aggregated check amount at higher and lower levels. Once data was loaded into the data warehouse and the cube processed, the reports could be viewed by any authticated user. The logic of integrating, aggregating and finding anomalies were all built into the solution. Analyst now had to just view the reports which had alerts showing these anomalies and not worry about manually finding them. This speeded the process of responsible stake holders to take faster actions and resolving issues quickly.
·Drill down reports were used to get into the respective claims and to the checks very easily and do the correction or report it.
·To get away with the requirement of analysts manipulating the amounts for the respective checks, we created Actions in the cube calling an asp .net web page passing respective parameters where they did the changes. The data manipulated directly reflected the data in the Data Warehouse and was soon available for reporting as the latest month data was stored in ROLAP model so the partition processing was very negligible.
·All this analysis was done from the Sharepoint interface where all the standard reports/views were deployed once the data was available. All users used this portal to view the application.

Conclusion
·The Actions in analysis services come in quite very handy. You can perform a variety of things like call any web page, drill down to database, etc. Certain complex requirements can be very easily handled.
·The data storage mechanism in analysis services provides designers to make a very robust and flexible design providing real time data changes to the data.
Hope you all enjoy reading this. Thanks