Using Power BI Desktop to Create Data Visualizations and Explore Data

In previous posts, I discussed how to create an Azure SQL Database using data from the City of Chicago Data portal. (See previous posts on this blog for more information) Part of the process was showing how Microsoft Excel can be used in conjunction with Power Pivot and Power View in order to create data visualizations. Using tools such as Excel to build visualizations is very practical because many people are comfortable working with Excel, and sharing the data visualizations is a matter of simply sharing the Excel workbook. There are, however, situations where a richer design canvas is required, or you want to publish your visualizations to a wide audience that may not have Excel installed.

Microsoft has recently released the Power BI Desktop, a tool that will allow you to create full-featured data visualizations on your desktop, and then publish them to the Power BI service (which has a free tier as well as a professional, paid tier) allowing you to share your visualizations with anyone that has a web browser and Internet connectivity.

Downloading and Installing Power BI Desktop

Before attempting to install the Power BI Desktop application, read through the system requirements to make sure that your system will support it. To download, click on the download button on the Power BI Desktop page: https://powerbi.microsoft.com/desktop Save the file to a local folder and then double-click to install. Follow the instructions on the screen and install the Power BI Desktop application along with any pre-requisites that the installer determines that it needs. Once the install completes, start Power BI Desktop which will open the following screen:

Once the application starts, you are ready to build your first dashboard. For the purposes of this post, we’ll recreate the visualizations that we built from the same data source using Excel Power View. (See Part 3 in my previous posts)

Connecting to Azure SQL Database and Loading Data

Click on the Get Data button and select the Microsoft Azure SQL Database option.

Click the Connect button and then enter your Azure SQL Database Server and Database information and then click OK.

When prompted, select Database Authentication and enter your connection credentials.

If you receive an error related to the firewall rules, follow the instructions from Managing the Database in Part 1 and then click the Refresh Preview button to refresh the data connection. If you do not receive an error, skip to the Navigator section below.

Once the refresh is complete you will see a list of objects in the database (which will look slightly different than the picture below as there are more objects in my database than what I’ve documented in previous posts)

On the Close & Load button, select Close to return to the blank canvas. When the canvas loads, select the Recent Sources button and select your database connection. This will open the Navigator.

Select the ChicagoCrimes, Community, DateDimension, PoliceStations and SocioEconomicFactors tables and then click the Load button.

The data load will start (during this process, Power BI Desktop is loading the data into a local Data Model that is using the same xVelocity data engine that Power Pivot uses) and will take some time to complete.

Once the data is loaded, we will want to ensure all of the appropriate relationships are in place and will also use the DAX language to create a very simple calculated measure to ensure that appropriate visualizations are easy to create. You will also want to click the Save command to save the data model at this time.

Managing Data Relationships

When the data load completes, you’ll be returned to the Power BI Desktop canvas. You will want to verify that the relationships between the fact table and dimension tables were properly detected. (At the time I’m writing this article, there is an issue with Power BI Desktop where it did not detect the relationship between the DateDimension table and the ChicagoCrimes fact table). Click on the Relationships icon on the left side (the bottom button) to open the data diagram view.

If there are any missing relationships (in the image above, the relationship between the ChicagoCrimes table and the DateDimension table is missing), click the Manage Relationships button on the ribbon to open the Manage Relationships dialog.

Click New to create a new relationship. Select the ChicagoCrimes table from the first drop-down, and then highlight the DateOnly column. In the next drop down select the DateDimension table and verify that the Date column is selected. And then select OK.

Select Close and verify that the relationship has been created. When all the relationships are created, the data model should look like this:

 

Once the relationships are properly created, we’ll want to add a calculated measure to simplify the creation of visualizations using the model.

Creating a New Calculated Measure

For the purposes of this post, we will create a very simple calculated measure that provides a count of the number of crime reports in the data. To create the measure, we’ll use the DAX language (that is typically thought of as a Power Pivot language, but since Power BI Desktop uses the same engine, the language is used here as well) to create the measure.

Select the Data icon on the left side (the middle icon) to open the data view, and ensure that the ChicagoCrimes table is selected in the Fields area on the right.

Click the New Measure button in the ribbon to enable the formula bar. Enter the DAX expression TotalReports = COUNT([ID]) and then click the checkmark icon to ensure that the expression is correct.

 

Once the new measure is created, we will want to make a change to the data type of the Year column in the DateDimension table to ensure that appropriate reports can be easily created. Because Power BI Desktop is designed to minimize errors and simplify the creation of reports, there are sometimes manipulations to the data that need to be made in order for Power BI to properly render visualizations of that data. In the case of the DateDimension table, we will want to use the Year column as an axis in a timeline. Because the Year column is currently defined as an Integer type, PowerBI Desktop will consider that column a measure and will make it difficult to use on a chart axis. The simple workaround is to change the data type of the column to Text.

In the fields section, select the DateDimension table and then select the Year column. From the ribbon, change the Data Type to Text.

Once these changes are made, you are ready to create a report based on the model.

Creating a Report

For the purposes of this post, we’ll recreate the visualization that was created in Excel Power View in Part 3 of the earlier blog series. Click the Report icon on the left side (the top icon) to open the report design canvas.

From the Fields area, select the TotalReports measure in the ChicagoCrimes table and then expand the DateDimension table and select the Year column. A column chart will be created on the canvas.

In the Visualizations area, select the Line Chart icon to change the visualization to a line chart. On the canvas, click a blank area and then in the Fields section, select the TotalReports measure as well as the Location Description field in the ChicagoCrimes table. On the canvas, drag the resulting visualization to the right of the line chart and resize it appropriately.

With the new chart selected, in the Visualizations area select the Bar Chart icon to change the visualization type to a bar chart. On the Bar chart, click the ellipsis (3 dots in the upper right) and sort the chart by the TotalReports field.

Click in a blank area on the canvas and then in the Fields area, select TotalReports from the ChicagoCrimes table and then expand the Community table and select the Communityad field. In the Visualizations area select the Map icon to change the visualization to a map, and then from the Fields area drag the Communityad field to the Location area under Visualizations. Resize the visualization appropriately.

 

After completing all of the steps above you will have a report that details all types of crime reports in Chicago from 2001 until the last date you loaded into your database. To filter the date to a specific type of crime, drag the Primary Type field from the ChicagoCrimes table to the Page Level Filters area under Visualizations. To replicate the report that was created in Excel, select the Homicide type in the filter.

Conclusion

In this blog post, we created a report using Power BI Desktop that connected to a Microsoft Azure SQL Database and detailed Homicides in the City of Chicago from 2001 until the last day of data that you loaded into your database.

Power BI Desktop is a powerful data analysis and visualization tool that allows you to gain insight from your data very quickly, and also share that insight by posting reports to the Power BI Service (future posts will discuss posting reports to the Power BI service).

Creating a “Real World” Database in Azure for Advanced Analytics Exploration: Part 4

The previous posts of this series (part 1, part 2, and part 3) focused on using publically-available crime report data to build a “real world” database in Azure, then showed how you can use some of the Power BI tools in Microsoft Excel to create interesting visualizations with that data.

In this post, I’ll walk through how to manually update the database with the latest data from the City of Chicago data portal so that your visualizations will be current. In future posts we’ll discuss how to automate this process, but for now, we’ll use a simplistic manual process.

Obtaining the Latest Data

The first step in updating the database is to understand what date and time the last police report was filed that you have added to your database. In order to accomplish this, you will need to use SQL Management Studio and connect to your database (if you need a refresher on how to do this, see the other posts in this series) and execute the following query:

SELECT MAX([Date]) FROM ChicagoCrimes;

 

This will return a single row with a DATETIME value such as: 2015-07-01 23:58:00.000

The way that that the City of Chicago Data Portal works is that they make available their crime data based on a 24-hour cycle, and they post complete cycles to the portal on a 7-day lag. (In other words, generally speaking the data in the portal is the last complete day from 1 week ago). Based on the date above (7/1/2015 at 2 minutes to midnight – Iron Maiden reference not intended J ) we can assume that all police reports through July 1 2015 have been uploaded to our database and we’ll need to start with police reports from July 2.

To download the latest police report data, browse to the City of Chicago’s data portal which is available at https://data.cityofchicago.org/ and select the Crimes – 2001 to present link:

This will open the dataset in a browser window with several options available to you:

In the upper-right-hand section of the window, you will see a Filter option. Select that option to open the filter pane:

Select the Add a New Filter Condition button to add a filter. Select the Date field, and then select the Is After option, and then select the target date (in my example, July 2) as shown:

After a few moments, the window will refresh and you will have a much smaller dataset than what you started with. Once the screen refreshes, in the upper-right-hand side of the screen, select the Export button:

This will open the Download dialog, where you will want to choose the CSV for Excel option and then save the file to your local hard drive.

Once the file is saved, you will be ready to upload it to your Azure database environment.

Updating the Database

The first post in this series discussed downloading the initial data from the City of Chicago and uploading it to a temporary table in Azure SQL Database. The first part of the process to update the data is exactly the same. The first step will be to upload the new data to a temporary table (for the purposes of this post, we’ll use the same table we created during that process) and then we’ll use T-SQL code to update the fact table in our database.

Preparing the Temporary Table

Before we upload the new data, we will want to truncate the temporary table. This step is not technically necessary as the T-SQL query we’ll run will not insert duplicates into the fact table, however for efficiencies sake we’ll want to start with a blank table.

Connect to your Azure SQL Database instance and execute the following T-SQL command:

TRUNCATE TABLE CRIMES_NEW;

 

This will ensure that the temporary table that we used during the initial creation of the database is ready to use again. If for some reason you deleted the initial temporary table, you can create it with the following T-SQL script:

CREATE TABLE [dbo].[Crimes_New](

    [ID] [varchar](50) NULL PRIMARY KEY,

    [Case Number] [varchar](50) NULL,

    [Date] [varchar](50) NULL,

    [Block] [varchar](50) NULL,

    [IUCR] [varchar](50) NULL,

    [Primary Type] [varchar](50) NULL,

    [Description] [varchar](max) NULL,

    [Location Description] [varchar](50) NULL,

    [Arrest] [varchar](50) NULL,

    [Domestic] [varchar](50) NULL,

    [Beat] [varchar](50) NULL,

    [District] [varchar](50) NULL,

    [Ward] [varchar](50) NULL,

    [Community Area] [varchar](50) NULL,

    [FBI Code] [varchar](50) NULL,

    [X Coordinate] [varchar](50) NULL,

    [Y Coordinate] [varchar](50) NULL,

    [Year] [varchar](50) NULL,

    [Updated On] [varchar](50) NULL,

    [Latitude] [varchar](50) NULL,

    [Longitude] [varchar](50) NULL,

    [Location] [varchar](50) NULL

);

 

Once the table is truncated, we’re ready to use the Import/Export Wizard to upload the new data.

Uploading New Data

Start the SQL Server Import Export Wizard (From the Start menu, search for SQL Server Import and Export Data) and then choose a flat file source from the Data source drop down menu. Browse for the file that you downloaded earlier, and add a double quote to the Text Qualifier textbox:

Select the Columns option on the left-hand side and verify that the columns are aligned properly:

Then select the Advanced option on the left-hand side and select the Description column. Change the Data Type to text stream [DT_TEXT] to ensure that the full width of the Description column will be uploaded.

Click Next and then select the SQL Server Native Client x.x (if you have SQL Server 2014 tools installed, your Native Client version will be 11.0) option from the Destination drop down. Enter your Azure SQL Database server name in the Server name: textbox and then choose the Use SQL Server Authentication radio button. Enter your User name and Password to connect to the Azure SQL Database instance and then select the ChicagoCrime database in the Database drop down menu.

Choose Next, and then select the [dbo].[Crimes_New] table in the Destination:<servername> field:

Click Next, and ensure the Run immediately checkbox is selected and then click Finish to begin the import.

Once the import is complete (depending on how much time has elapsed between the initial load and the update, this process will either be very quick or could take a few minutes) you will see the following:

Click Close to close the Import Export Wizard. Once the temporary table has been loaded, you can connect to the database and transfer the data to the fact table in the database.

Updating the Fact Table with the New Data

Adding the new data to the ChicagoCrimes Fact table is a relatively straightforward process. You could easily modify the process above to load the data directly into the fact table from the text file, however by separating the process of uploading from updating, you are ensuring that any errors that occur can be corrected easily.

To update the fact table, connect to your database instance using SQL Server Management Studio and execute the following script:

INSERT ChicagoCrimes

SELECT [ID]

,[Case Number]

,[Date]

,[Block]

,[IUCR]

,[Primary Type]

,[Description]

,[Location Description]

,[Arrest]

,[Domestic]

,[Beat]

,[District]

,[Ward]

,[Community Area]

,[FBI Code]

,[X Coordinate]

,[Y Coordinate]

,[Year]

,[Updated On]

,[Latitude]

,[Longitude]

,[Location]

     ,CAST([Date] AS DATE)

FROM [dbo].[Crimes_New]

WHERE [Date] > (SELECT MAX([Date]) FROM ChicagoCrimes);

 

When the script finishes, you should see the same number of rows affected as you saw transferred in the Import Export Wizard:

Once the script is complete, the fact table in your database will be updated with the latest information available. You can verify that the data was updated properly by re-running the following T-SQL query and comparing the date returned with the initial date from above:

SELECT MAX([Date]) FROM ChicagoCrimes;

 

You should see a newer date returned.

Updating the Excel Workbook

In Part 3 of this series we used an Excel Workbook, a Power Pivot model and Power View to create a simple visualization of crime data. Once the Fact Table has been updated, you’ll want to return to the workbook and refresh the data to ensure the new rows are loaded into the visualizations.

Start Excel and open the workbook you created in Part 3, and when the workbook is opened, click the Enable Content button:

Select Power Pivot from the ribbon and select the Manage command to open the Power Pivot window:

From the ribbon, select the Refresh drop-down and select Refresh All and when prompted, enter the password for the database. When the refresh is complete, you should see the following dialog:

Note that the refresh will repopulate the entire Power Pivot model, not just the added rows.

Conclusion

In this post, we downloaded the latest information from the City of Chicago data portal and manually updated the database with the new information. You also updated the Excel Workbook with visualizations that you created earlier. This process can be re-run any time you want to add updated data to your database.

Future posts in this series will detail how to use the Azure Data Factory to automate the process of updating the data. I will also be creating a new series of posts that will use the data from this series to create an Azure Machine Learning predictive analytics model.