Full-stack data analytics project

HRM Crime
Analytics

An end-to-end pipeline covering ETL, data warehousing, unsupervised machine learning, and interactive reporting — built to answer one question: is Halifax safe?

Records 1,028 incidents
Coverage 68 days
Period Jan 14 – Mar 2026
Status Active collection
1,028
Total Crimes
Jan–Mar 2026
54.5%
Assault Share
560 of 1,028
+30.5%
Month-over-Month
Feb → Mar 2026
7.2%
High Severity
74 incidents
4
Geo-Clusters
K-Means (coordinates only)
5
Report Pages
Power BI + DAX

A question that felt personal

I'm an international student, and when I arrived in Halifax I had no idea what the city was really like. I'd heard rumours that Halifax was a human trafficking hotspot, and I wanted to find out if there was any data to back that up.

When I went looking, I couldn't find human trafficking data — but I found something else: the HRM Open Data Hub publishes a rolling 7-day dataset of general occurrence crimes. The catch? No historical archive. Only the last 7 days are ever available.

So I built one myself. I manually downloaded the CSV every other day starting January 14, 2026, and loaded each file through my SSIS pipeline into a SQL Server data warehouse. What started as a personal safety question turned into a full data engineering and analytics project.

Five categories, one dominant

Assault comprises more than half of all reported incidents — and it appears in every geographic cluster, not just downtown.

Assault
560
Theft from Vehicle
208
Break & Enter
96
Theft of Vehicle
82
Robbery
49

A sustained climb, not a spike

Crime grew every single month without exception. The rate of growth is slowing, but volume is still rising — and March brought the highest severity rate yet.

January

Baseline
242

Assault already at 50% of total. Gloria McCluskey Ave and Gottingen St established as hotspots immediately. High severity at 6.6%.

February

+40.9%
341

Sharpest jump in dataset. High severity actually dropped to 6.2% — more crimes, but proportionally fewer serious ones. Spring Garden Rd entered top 5.

March

+30.5%
445

Most concerning: highest volume and highest severity (8.3%) simultaneously. Mumford Rd surged from 2 → 17 incidents.

How I built it

A full-stack pipeline from manual CSV collection through to interactive Power BI reporting, with unsupervised ML in between.

Data Engineering

SSIS Foreach Loop pipeline with staging, incremental dimension loading, fact table, and pre-aggregated reporting tables. Error rows redirected, not dropped.

SSIS SQL Server Star Schema Visual Studio

Machine Learning

K-Means clustering on coordinates only (k=4). Optimal k found via elbow method + KneeLocator, validated with silhouette score. Geography alone predicted crime type.

scikit-learn pandas numpy plotly kneed

Business Intelligence

5-page Power BI report with DAX measures for KPIs, MoM comparison, 7-day moving average, severity percentages, and Q2 forecast with confidence intervals.

Power BI DAX Power Query Forecasting

Data Warehouse

Star schema with IncidentID grain. Staging → Dimensions → Fact → Reporting layers. Designed for rerunning without duplicates and future event type expansion.

Star Schema Fact Table Dimensions Incremental Load

Six-step data flow

From CSV files to Power BI — every step designed to be re-runnable, fault-tolerant, and transparent about errors.

01 — Enumerate Crime Data

Foreach Loop container iterates over all CSVs in source folder. File path dynamically bound to package variable — drop any number of new files and run once.

02 — Staging

Flat File Source → OLE DB Destination into stg.HalifaxCrime. Error rows redirected (not failed) — pipeline keeps running and logs bad records for review.

03 — Dimension Loading

Three dimensions loaded incrementally: dim.Crime (type, category, severity), dim.Event (reporting type), dim.Location (street address). NOT IN subquery prevents duplicates.

04 — Fact Table

fact.CrimeIncidents — one row per incident with foreign keys to all dimensions plus date key, coordinates (x, y), and source record identifier.

05 — Reporting Tables

Pre-aggregated views: rpt.ByCategory, rpt.ByLocation, rpt.BySeverity. Lighter query layer for Power BI rather than hitting the fact table for every summary visual.

06 — Calendar Table (DAX)

Generated in Power BI — spans ~6 years with year, month, week number, and weekday/weekend flags for time intelligence measures and forecasting.

Four zones, distinct crime profiles

K-Means was fed only x/y coordinates — no crime type data at all. Yet the clusters came back criminologically coherent. Geography alone predicts crime type in HRM.

786
Urban Core
Downtown Halifax/Dartmouth peninsula. All 5 crime types, highest density. 96% of all robberies.
148
Suburban
Bedford, Clayton Park, suburban Dartmouth. Property crime more prominent.
21
Rural North
Fall River, Sackville corridor. Low volume, assault-dominant. Zero robberies.
12
Remote East
Eastern Passage, Lawrencetown. Isolated incidents, only 3 crime types present.
Cluster Assault Break & Enter Robbery Theft from Vehicle Theft of Vehicle Total
Urban Core406754718969786
Suburban107162176148
Rural North13302321
Remote East6200412

What the data reveals

Ten findings from 68 days of manually collected crime data across Halifax Regional Municipality.

Severity

Assault dominates everywhere

54.5% of all incidents and present in every geographic cluster — not a downtown-only problem. It follows people regardless of location.

Volume

Unbroken monthly growth

242 → 341 → 445 across Jan–Mar. The rate of growth is slowing, but volume is still climbing every single month.

Severity

March raised both alarms

Highest volume and highest severity (8.3%) simultaneously. That's the combination worth watching most closely.

Temporal

72% happen on weekdays

Property crimes get discovered and reported during business hours. The distribution reflects reporting behaviour as much as occurrence.

Spatial

Urban Core concentrates crime

72% of assaults, 85% of theft from vehicle, and 96% of robberies all happen in the downtown peninsula.

Spatial

Robbery is urban-exclusive

Zero robbery incidents in Rural North or Remote East. It needs victim proximity that rural areas don't provide.

Volume

Mumford Road surged

2 incidents in February → 17 in March. The biggest single location shift in the dataset — possible geographic displacement.

Spatial

Top 5 are corridors, not neighbourhoods

All hotspot locations are busy urban thoroughfares: Gloria McCluskey Ave, Gottingen St, Mumford Rd, Spring Garden Rd, Barrington St.

Temporal

2–3 week cyclical pattern

The 7-day moving average shows rhythmic peaks and troughs, with rising floors — even quiet weeks are getting busier.

Spatial

Geography predicts crime type

K-Means used only coordinates — no crime type input — yet produced criminologically distinct clusters. Geography is a strong natural predictor.

What's ahead

Built entirely in Power BI using DAX. The forecast extends from observed data (~397–434 in late March/early April) through June 2026. The March baseline projection — which I trust most — suggests a possible plateau in the 400–480 crimes/month range through Q2.

The confidence interval widens deliberately as it projects further out. With only 68 days of training data, showing a narrow cone would be dishonest. The lower bound never drops below March's observed floor, meaning the model treats March's activity level as the new baseline minimum.

Data-informed suggestions

Not policy directives — these come directly from the numbers, intended for different audiences who might engage with this project.

Newcomers & International Students

Be informed, not afraid

The five highest-incident locations are busy urban corridors. Assault is everywhere at 54% — awareness matters in every part of the city. 96% of robberies are Urban Core, and weekdays see 72% of reported crimes.

Residents & Community

Watch the suburban shift

Property crime is creeping into suburbs — Bedford, Clayton Park, suburban Dartmouth. Mumford Road's 2→17 surge is the sharpest location-level change in the dataset. Even rural clusters have assault incidents.

Law Enforcement

Urban Core, weekday-weighted

The Urban Core is the highest-leverage target. Data supports weekday-weighted deployment. The 2–3 week cycle suggests behavioural rhythms worth investigating. March is a warning sign for resource calibration.

Municipal Policy

Publish historical data

The 7-day rolling window is a gap in HRM's own analytical capacity. Extending to 90 days would enable evidence-based planning at minimal cost. Open data enables community-led analysis like this project.

Limitations & known gaps

Every gap documented. Every dip in the moving average checked against this list before drawing conclusions.

Limitation Impact Mitigation
7-day rolling source window Can't backfill missed days Every gap documented in README
Missing collection days Artificial dips in moving average Flagged throughout analysis
No time-of-day field Can't analyse peak crime hours Flagged as future improvement
Only 68 days of data Wide forecast confidence intervals Kept confidence cone honest
1 error record (OBJECTID overflow) Excluded from analysis Surfaced in ETL dashboard
Coordinate duplication 1,028 incidents → 795 unique points Acknowledged and documented
No demographic data Can't profile victims or offenders Outside scope; public data limitation

Download & explore

Start with the PDF for a quick visual overview, then open the .pbix in Power BI Desktop to explore interactive slicers and tooltips.